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Abstract: We give a comprehensive self-contained review on the rigor- 
ous analysis of the thermodynamics of a class of random spin systems of 
mean field type whose most prominent example is the Hopfield model. 
We focus on the low temperature phase and the analysis of the Gibbs 
measures with large deviation techniques. There is a very detailed and 
complete picture in the regime of "small a" ; a particularly satisfactory 
result concerns a non-trivial regime of parameters in which we prove 
1) the convergence of the local "mean fields" to gaussian random vari- 
ables with constant variance and random mean; the random means 
are from site to site independent gaussians themselves; 2) "propaga- 
tion of chaos" , i.e. factorization of the extremal infinite volume Gibbs 
measures, and 3) the correctness of the "replica symmetric solution" 
of Amit, Gutfreund and Sompolinsky [AGS]. This last result was first 
proven by M. Talagrand [T4], using different techniques. 
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L'intuition ne pent nous donner la rigeur, 
ni meme la certitude, on s'en est apergu de plus en plus. 

Henri Poincare, 
"La Valeur de La Science 

I. Introduction 

Twenty years ago, Pastur and Figotin [FP1,FP2] first introduced 
and studied what has become known to be the Hopfield model and 
which turned out, over the years, as one of the more successful and 
important models of a disordered system. This is also reflected in the 
fact that several contributions in this book are devoted to it. The Hop- 
field model is quite versatile and models various situations: Pastur and 
Figotin introduced it as a simple model for a spin glass, while Hopfield, 
in 1982, independently considered it as a model for associative mem- 
ory. The first viewpoint naturally put it in the context of equilibrium 
statistical mechanics, while Hopfield's main interest was its dynamics. 
But the great success of what became known as the Hopfield model 
came from the realization, mainly in the work of Amit, Gutfreund, 
and Sompolinsky [AGS] that a more complicated version of this model 
is reminiscent to a spin glass, and that the (then) recently developed 
methods of spin-glass theory, in particular the replica trick and Parisi's 
replica symmetry breaking scheme could be adapted to this model and 
allowed a "complete" analysis of the equilibrium statistical mechanics 
of the model and to recover some of the most prominent "experimen- 
tally" observed features of the model like the "storage capacity" , and 
"loss of memory" in a precise analytical way. This observation sparked 
a surge of interest by theoretical physicists into neural network theory 
in general that has led to considerable progress in the field (the litera- 
ture on the subject is extremely rich, and there are a great number of 
good review papers. See for example [A,HKP,GM,MR,DHS]). We will 
not review this development here. In spite of their success, the meth- 
ods used in the analysis by theoretical physicist were of heuristic nature 
and involved mathematically unjustified procedures and it may not be 
too unfair to say that they do not really provide a deeper understand- 
ing for what is really going on in these systems. Mathematicians and 
mathematical physicists were only late entering this field; as a matter 
of fact, spin glass theory was (and is) considered a field difficult, if not 
impossible, to access by rigorous mathematical techniques. 

As is demonstrated in this book, in the course of the last decade the 
attitude of at least some mathematicians and mathematical physicists 
towards this field has changed, and some now consider it as a major 
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challenge to be faced rather than a nuisance to be avoided. And already, 
substantial progress in a rigorous mathematical sense has begun to be 
made. The Hopfield model has been for us the focal point of attention in 
this respect over the last five years and in this article we will review the 
results obtained by us in this spirit. Our approach to the model may be 
called "generalized random mean field models" , and is in spirit close to 
large deviation theory. We will give a precise outlay of this general set- 
ting in the next section. Historically, our basic approach can be traced 
back even to the original papers by Pastur and Figotin. In this setting, 
the "number of patterns" , M, or rather its relation to the system size 
N, is a crucial parameter and the larger it is, the more difficult things 
are getting. The case where M is is strictly bounded could be termed 
"standard disordered mean field", and it is this type of models that 
were studied by Pastur and Figotin in 1977, the case of two patterns 
having been introduced by Luttinger [Lut] shortly before that. Such 
"site-disorder" models were studied again intensely some years later by 
a number of people, emphasizing applications of large deviation meth- 
ods [vHvEC,vHl,GK,vHGHK,vH2,AGS2,JK,vEvHP]. A general large 
deviation theory for such systems was obtained by Comets [Co] some- 
what later. This was far from the "physically" interesting case where 
the ratio between M and N, traditionally called a, is a finite posi- 
tive number [Ho, AGS]. The approach of Grensing and Kuhn [GK], 
that could be described as the most straightforward generalization of 
the large deviation analysis of the Curie- Weiss model by combinatorial 
computation of the entropy (see Ellis' book [El] for a detailed expo- 
sition), was the first to be generalized to unbounded M by Koch and 
Piasko [KP] (but see also [vHvE]). Although their condition on M, 
namely M < was quite strong, until 1992 this remained the only 
rigorous result on the thermodynamics of the model with an unbounded 
number of patterns and their analysis involved for the first time a non- 
trivial control on fluctuations of a free energy functional. Within their 
framework, however, the barrier In A" appeared unsurmountable, and 
some crucial new ideas were needed. They came in two almost simul- 
taneous papers by Shcherbina and Tirozzi [ST] and Koch [K]. They 
proved that the free energy of the Hopfield model in the thermody- 
namic limit is equal to that of the Curie- Weiss model, provided only 
that limjvtoo ^ = 0, without condition on the speed of convergence. In 
their proof this fact was linked to the convergence in norm of a certain 
random matrix constructed from the patterns to the identity matrix. 
Control on this matrix proved one key element in further progress. 
Building on this observation, in a paper with Picco [BGP1] we were 
able to give a construction of the extremal Gibbs states under the same 
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hypothesis, and even get first results on the Gibbs states in the case 
| = a < 1. Further progress in this latter case, however, required 
yet another key idea: the use of exponential concentration of measure 
estimates. Variance estimates based on the Yurinskii martingale con- 
struction had already appeared in [ST] where they were used to prove 
self-averaging of the free energy. With Picco [BGP3] we proved expo- 
nential estimates on "local" free energies and used this to show that 
disjoint Gibbs states corresponding to all patterns can be constructed 
for small enough a. A considerable refinement of this analysis that 
included a detailed analysis of the local minima near the Mattis states 
[Ma] was given in a later paper by the present authors [BG5]. The 
result is a fairly complete and rigorous picture of the Gibbs states and 
even metastable states in the small a regime, which is in good agree- 
ment with the heuristic results of [AGS]. During the preparation of 
this manuscript, a remarkable breakthrough was obtained by Michel 
Talagrand [T4]. He succeeded in proving that in a certain (nontrivial) 
range of the parameters (3 and a, the validity of the "replica symmetric 
solution" of [AGS] can be rigorously justified. It turns out that a re- 
sult obtained in [BG5] can be used to give an alternative proof of that 
also yields some complementary information and in particular allows 
to analyse the convergence properties of the Gibbs measures in that 
regime. We find it particularly pleasant that, 10 years after the paper 
by Amit et al., we can present this development in this review. 

In the present paper we will give a fairly complete and streamlined 
version of our approach, emphasizing generalizations beyond the stan- 
dard Hopfield model, even though we will not work out all the details 
at every point. We have tried to give proofs that are either simpler or 
more systematic than the original ones and believe to have succeeded 
to some extent. At some places technical proofs that we were not able 
to improve substantially are omitted and reference is made to the orig- 
inal papers. In Section 2 we present a derivation of the Hopfield model 
as a mean field spin glass, introduce the concept of generalized random 
mean field models and discuss the thermodynamic formalism for such 
systems. We point out some popular variants of the Hopfield model 
and place them in this general framework. Section 3 discusses some 
necessary background on large deviations, emphasizing calculational 
aspects. This section is quite general and can be regarded as com- 
pletely independent from particular models. Section 4 brings the last 
proof on exponential estimates on maximal and minimal eigenvalues 
of some matrices that are used throughout in the sequel. In Section 
5 we show how large deviation estimates lead to estimates on Gibbs 
measures. Here the theme of concentration of measure appears in a 
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crucial way. Section 6 as well as Section 7 are devoted to the study of 
the function $ that emerged from Section 3 as a crucial instrument to 
control large deviations. Section 8, finally gives a rigorous derivation 
of the replica symmetric solution of [AGS] in an appropriate range of 
parameters, and the comstruction of the limiting distribution of the 
Gibbs measures (the "metastate" in the language of [NS]). 

There are a number of other results on the Hopfield model that we 
do not discuss. We never talk here about the high temperature phase, 
and we also exclude the study of the zero temperature case. Also we 
do not speak about the case a = but will always assume a > 0. How- 
ever, all proofs work also when ^ J, 0, with some trivial modifications 
necessary when M(N) remains bounded or grows slowly. In this sit- 
uation some more refined results, like large deviation principles [BG4] 
and central limit theorems [Gl] can be obtained. Such results will be 
covered in other contributions to this volume. 

Acknowledgements. We are grateful to Michel Talagrand for sending 
us copies of his work, in particular [T4] prior to publication. This 
inspired most of Section 8. We also are indebted to Dima Ioffe for 
suggesting at the right moment that the inequalities in [BL] could be 
the right tool to make use of Theorem 8.1. This proved a key idea. We 
thank Aernout van Enter for a careful reading of the manuscript and 
numerous helpful comments. 
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2. Generalized random mean field models 



This section introduces the general setup of our approach, including 
a definition of the concept of "generalized random mean field model" 
and the corresponding thermodynamic formalism. But before giving 
formal definitions, we will show how such a class of models and the 
Hopfield model in particular arises naturally in the attempt to con- 
struct mean field models for spin glasses, or to construct models of 
autoassociative memory. 

2.1. The Hopfield model as a mean field spin glass. 

The derivation we are going to present does not follow the histori- 
cal development. In fact, what is generally considered "the" mean field 
spin glass model, the Sherrington- Kirkpatrick model [SK], is different 
(although, as we will see, related) and not even, according to the defi- 
nition we will use, a mean field model (a fact which may explain why 
it is so much harder to analyse than its inventors apparently expected, 
and which in many ways makes it much more interesting) . What do we 
mean by "mean field model"? A spin system on a lattice is, roughly, 
given by a lattice, typically Z d , a local spin space iS, which could be 
some Polish space but which for the present we can think of as the dis- 
crete set S = { — 1, +1}, the configuration space Soo = S z and its finite 
volume subspaces S\ = S A for any finite A C Z d , and a Hamiltonian 
function H that for any finite A gives the energy of a configuration 
a G in the volume A, as H\(a). We will say that a spin system 
is a mean field model if its Hamiltonian depends on a only through a 
set of so-called macroscopic functions or order parameters. By this we 
mean typically spatial averages of local functions of the configuration. 
If the mean field model is supposed to describe reasonably well a given 
spin system, a set of such functions should be used so that their equi- 
librium values suffice to characterize completely the phase diagram of 
the model. For instance, for a ferromagnetic spin system it suffices to 
consider the total magnetization in a volume A, m\(a) = ^ J2ieA 0i 
as order parameter. A mean field Hamiltonian for a ferromagnet is 
then H^ m (a) = —\A\E(m\((r)); the physically most natural choice 
E(m) = -|m 2 gives the Curie-Weiss model. Note that 




(2.1) 



which makes manifest the idea that in this model the spins o~i at the 
site % interact only with the (non-local) mean- field ^^rpp-- In the 
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Curie- Weiss case this mean field is of course the mean magnetization 
itself. Note that the order parameter m\(a) measures how close the 
spin configuration in A is to the ferromagnetic ground states Oi = 
+1, resp. Oi = —1. If we wanted to model an antiferromagnet, the 
corresponding order parameter would be the staggered magnetization 

™a(o-) = 1 j [ E <eA (-i) i: '= i< ;^- 

In general, a natural choice for a set of order parameters will be 
given by the projections of the spin configurations to the ground states 
of the system. By ground states we mean configurations a that for all 
A minimize the Hamiltonian H\ in the sense that H\(a) cannot be 
made smaller by changing a only within A 1 . So if £ x , ... , £ M are the 
ground states of our system, we should define the M order parameters 

m i<» = pqE l eA^ 1 ^--- 5 ™A f (» = pqE l6 A^ M(j * and take as a 

Hamiltonian a function H™ (a) = — \A.\E (m\(a), m^(a)). For 
consistency, one should of course choose E in such a way that £ 1 , . . . , £ M 
are ground states of the so defined H™f(a). We see that in this spirit, 
the construction of a mean field model departs from assumptions on 
the ground states of the real model. 

Next we should say what we mean by "spin glass" . This is a more 
complicated issue. The generally accepted model for a lattice spin- 
glass is the Edwards- Anderson model [EA] in which Ising spins on a 
lattice Z d interact via nearest-neighbour couplings Jij that are inde- 
pendent random variables with zero mean. Little is known about the 
low-temperature properties of this model on a rigorous level, and even 
on the heuristic level there are conflicting opinions, and it will be dif- 
ficult to find consensus within a reasonably large crowd of experts on 
what should be reasonable assumptions on the nature of ground states 
in a spin glass. But there will be some that would agree on the two 
following features which should hold in high enough dimension 2 

(1) The ground states are "disordered". 

(2) The number of ground states is infinite. 

Moreover, the most "relevant" ground states should be stationary 
random fields, although not much more can be said a priori on their 
distribution. Starting from these assumptions, we should choose some 
function M(A) that tends to infinity as A f Z d and M(A) random vec- 



We are somewhat too simplistic here. The notion of ground states should in 
general not only be applied to individual configurations but rather to measures on 
configuration space (mainly to avoid the problem of local degeneracy) ; however, we 
will ignore such complications here. 

For arguments in favour of this, see e.g. [BF,vE], for a different view e.g. [FH]. 
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tors £ M , defined on some probability space (fi, J 7 , P) and taking values 
in iSoo and define, for all u> G fi, a M(A)-dimensional vector of order 
parameters with components, 

m^](a)^]T^K (2.2) 



i6A 



and finally choosing the Hamiltonian as some function of this vector. 
The most natural choice in many ways is 



H A [u)](a) = ||m A [o;]((7 



|2 



2 — — "2 
|A| M(A) 

= —2" E ^aM(^)] 2 (2 _ 3) 

, M(A) 

2lA| ^ E«N^ 

i,j'£A m=i 



If we make the additional assumption that the random variables 
£f are independent and identically distributed with P[£^ = ±1] = | 
we have obtained exactly the Hopfield model [Ho] in its most standard 
form 3 . Note that at this point we can replace without any loss A by 
the set {1, . . . , N}. Note also that many of the most common variants 
of the Hopfield model are simply obtained by a different choice of the 
function E{m) or by different assumptions on the distribution of £. 

In the light of what we said before we should check whether this 
choice was consistent, i.e. whether the ground states of the Hamilto- 
nian (2.3) are indeed the vectors £ M , at least with probability tending 
to one. This will depend on the behavior of the function M(N). From 
what is known today, in a strict sense this is true only if M(N) < c^j^ 
[McE,Mar] whereas under a mild relaxation (allowing deviations that 
are invisible on the level of the macroscopic variables mjv), this holds 
as long as lini/vToo M ^ = [BGP1]. It does not hold for faster grow- 
ing M(N) [Lu]. On the contrary, one might ask whether for given 
M(A) consistency can be reached by the choice of a different distribu- 
tion P. This seems an interesting, and to our knowledge completely 
uninvestigated question. 

2.2 The Hopfield model as an autoassociative memory. 



Observe that the lattice structure of the set A plays no role anymore and we 
can consider it simply as a set of points 
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Hopfield's purpose when deriving his model was not to model spin 
glasses, but to describe the capability of a neural network to act as 
a memory. In fact, the type of interaction for him was more or less 
dictated by assumptions on neural functioning. Let us, however, give 
another, fake, derivation of his model. By an autoassociative memory 
we will understand an algorithm that is capable of associating input 
data to a preselected set of learned patterns. Such an algorithm may 
be deterministic or stochastic. We will generally only be interested in 
complex data, i.e. a pattern should contain a large amount of infor- 
mation. A pattern is thus naturally described as an element of a set 
S N , and a reasonable description of any possible datum a G S N within 
that set in relation to the stored patterns . . . £ M is in terms of its 
similarity to these patterns that is expressed in terms of the vector of 
overlap parameters m(a) whose components are m M (cr) = £f °i- 

If we agree that this should be all the information we care about, it 
is natural to construct an algorithm that can be expressed in terms of 
these variables only. A most natural candidate for such an algorithm 
is a Glauber dynamics with respect to a mean field Hamiltonian like 
(2.3). Functioning of the memory is then naturally interpreted by the 
existence of equilibrium measures corresponding to the stored patterns. 
Here the assumptions on the distribution of the patterns are dictated 
by a priori assumptions on the types of patterns one wants to store, and 
the maximal M(N) for which the memory "functions" is called storage 
capacity and should be determined by the theory. In this paper we will 
not say much about this dynamical aspect, mainly because there are 
almost no mathematical results on this. It is clear from all we know 
about Glauber dynamics, that a detailed knowledge of the equilibrium 
distribution is necessary, but also "almost" sufficient to understand the 
main features of the long time properties of the dynamics. These things 
are within reach of the present theory, but only first steps have been 
carried out (See e.g. [MS]). 

2.3 Definition of generalized random mean field models. 

Having seen how the Hopfield model emerges naturally in the 
framework of mean field theory, we will now introduce a rather general 
framework that allows to encompass this model as well as numerous 
generalizations. We like to call this framework generalized random 
mean field models mainly due to the fact that we allow an unbounded 
number of order parameters, rather than a finite (independent of N) 
one which would fall in the classical setting of mean field theory and 
for which the standard framework of large deviation theory, as outlined 
in Ellis' book [El], applies immediately. 



9 



A generalized random mean field model needs the following ingre- 
dients. 

(i) A single spin space S that we will always take to be a subset of 
some linear space, equipped with some a priori probability measure 
Q- 

(ii) A state space S N whose elements we denote by a and call spin 
configurations, equipped with the product measure Ylil(dai). 

(iii) The dual space (S N )* M of linear maps f£ >M : S N -> R M . 

(iv) A mean field potential which is some real valued function Em : 
R M -> R, that we will assume 

(iv.l) Bounded below (w.l.g. .Em (to) > 0). 

(iv.2) in most cases, convex and "essentially smooth" , that is, it has 
a domain T> with non-empty interior, is differentiable on its 
domain, and lim m ^ax> |V-Em(to)| = +00 (see [Ho]). 

(v) An abstract probability space (O, JF, P) and measurable maps £ T : 
O -> (S N )* N . Note that if is the canonical projection M N -> R N , 

then t, M nWI = IIm^H n^ 1 are random elements of (S N )* M . 

(vi) The random order parameter 

m N , M [u](°) = -^m,jvM" e M M (2.4) 

(vii) A random Hamiltonian 

H NM [u]{a) = -NE M (wijv,m[w](c)) (2.5) 

Remark. The formulation above corresponds to what in large devia- 
tion theory is known as "level 1" , i.e. we consider the Hamiltonian as a 
function of order parameters that are functions ( "empirical averages" ) 
rather than as a function of empirical measures as in a "level 2" for- 
mulations. In some cases a level 2 formulation would be more natural, 
but since in our main examples everything can be done on level 1, we 
prefer to stick to this language. 

With these objects we define the finite volume Gibbs measures, 
(which more precisely are probability measure valued random variables) 
W,n,m on {S N ,B{S N )) through 

e -PH NtM [u>\{a) 

VP,n,m M (da) = — —— TT q(do-i) (2.6) 
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where the normalizing factor, called partition function, is 

% ff ,MH = E ff e-^'«MW (2.7) 

where K a stands for the expectation with respect to the a priori prod- 
uct measure on S N . Due to the special feature of these models that 
Hn,m[w] depends on a only through 171n,m[u] (&), the distribution of 
these quantities contains essentially all information on the Gibbs mea- 
sures themselves (i.e. the measures /j,/3,n,m[^] restricted to the level 
sets of the functions m^jwH are the uniform distribution on these 
sets) and thus play a particularly prominent role. They are measures 
on (M M , £>(1R M )) and we will call them induced measures and denote 
them by 

Q/3,n,m[v] = ^.at.mM o ^^ )Jtf [w]j (2.8) 

In the classical setting of mean field theory, N would now be con- 
sidered as the large parameter tending to infinity while M would be 
some constant number, independent of N. The main new feature here 
is that both N and M are large parameters and that as N tends to 
infinity, we choose M = M(N) as some function of N that tends to 
infinity as well. However, we stress that the entire approach is geared 
to the case where at least M(N) < N, and even M(N)/N = a is small. 
In fact, the passage to the induced measures Q appears reasonably mo- 
tivated only in this case, since only then we work in a space of lower 
dimension. To study e.g. the Hopfield model for a large will require 
entirely different ideas which we do not have. 

It may be worthwhile to make some remarks on randomness and self 
averaging at this point in a somewhat informal way. As was pointed out 
in [BGP1], the distribution Q of the order parameters can be expected 
to be much less "random" than the distribution of the spins. This is 
to be understood in a rather strong sense: Define 

//3,iv,M,pM(m) = -j^ In Q p , n ,mM {B p {m)) (2.9) 

where B p {m) C M M is the ball of radius p centered at m. Then by 
strong self-averaging we mean that (for suitably chosen p) f as a func- 
tion of m is everywhere "close" to its expectation with probability close 
to one (for N large)). Such a fact holds in a sharp sense when M is 
bounded, but it remains "essentially" true as long as M(N)/N | 
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(This statement will be made precise in Section 6). This is the reason 
why under this hypothesis, these systems actually behave very much 
like ordinary mean field models. When a > 0, what "close" can mean 
will depend on a, but for small a this will be controllable. This is the 
reason why it will turn out to be possible to study the situation with 
a small as a perturbation of the case a = 0. 

2.4 Thermodynamic limits 

Although in some sense "only finite volume estimates really count" , 
we are interested generally in asymptotic results as N (and M) tend to 
infinity, and it is suitable to discuss in a precise way the corresponding 
procedure of thermodynamic limits. 

In standard spin systems with short range interactions there is 
a well established beautiful procedure of constructing infinite volume 
Gibbs measures from the set of all finite volume measures (with "bound- 
ary conditions") due to Dobrushin, Lanford and Ruelle (for a good 
exposition see e.g. [Geo]). This procedure cannot be applied in the 
context of mean field models, essentially because the finite volume 
Hamiltonians are not restrictions to finite volume of some formal in- 
finite volume Hamiltonian, but contain parameters that depend in an 
explicit way on the volume N . It is however still possible to consider 
so called limiting Gibbs measures obtained as accumulation points of 
sequences of finite volume measures. This does, however require some 
discussion. 

Observe first that it is of course trivial to extend the finite vol- 
ume Gibbs measures np,N,M to measures on the infinite product space 
(5 N , i3(iS N )), e.g. by tensoring it with the a priori measures q on the 
components i > N. Similarly, the induced measures can be extended 
to the space (R N , £>(1R N )) by tensoring with the Dirac measure concen- 
trated on 0. One might now be tempted to define the set of limiting 
Gibbs measures as the set of limit points, e.g. 

Cp[u] = c1uSat Too {fJLp,N,M(N)[v\} ( 2 -!0) 

where clusjviooaAr denotes the set of limit points ("cluster set") of the 
sequence a at. However, it is easy to see that in general this set is not 
rich enough to describe the physical content of the model. E.g., if we 
consider the Curie-Weiss model (c.f. (2.1)) it is easy to see and well 
known that this cluster set would always consist of a single element, 
namely the measure \ (EI^i ? m Jr Y\7LiQ~ m where q a {pi) = 
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2 cosh(^a) an d where m*{j3) is the largest solution of the equation 

x = ttmhl3x (2.11) 

(and which we will have many occasions to meet in the sequel of this 
article). If (3 > 1, m*{(3) > 0, and the limiting measure is a mixture; we 
would certainly want to be allowed to call the two summands limiting 
Gibbs measures as well, and to consider them as extremal, with all 
limiting Gibbs measures convex combinations of them. The fact that 
more than one such extremal measure exists would be the sign of the 
occurrence of a phase transition if (3 > 1. 

The standard way out of this problem is to consider a richer class 
of tilted Gibbs measures 

e -PH NiM [uK<r)+l3Nh(m N ,MW(<T)) N 

4,n,mM^) = — n 11^) ( 2 - 12 ) 

Z P,n,mM i=1 



where h : R — > R is a small perturbation that plays the role of a 
symmetry breaking term. In most cases it suffices to choose linear 
perturbations, h (m^MM^)) = (h, mN,M[w](cr)), in which case h can 
be interpreted as a magnetic field. Instead of (2.10) one defines then 
the set 

Cp[u] = clusi^n^io.iVToo {^,jv,m(m)M} ( 2 - 13 ) 

where we first consider the limit points that can be obtained for all h G 
R°° and then collect all possible limit points that can be obtained as h 
is taken to zero (with respect to the sup-norm). Clearly Cp C Cp. If this 
inclusion is strict, this means that the infinite volume Gibbs measures 
depend in a discontinuous way on h at h = 0, which corresponds to the 
standard physical definition of a first order phase transition. We will 
call Cp[u] the set of limiting Gibbs measures. 

The set Cp[ui] will in general not be a convex set. E.g., in the Curie- 
Weiss case, it consists, for f3 > 1 of three elements, /i^,^^, and 

1(^/3 00 +^00)- (Exercise: Prove this statement!). However, we may 
still consider the convex closure of this set and call its extremal points 
extremal Gibbs measures. It is likely, but we are not aware of a proof, 
that all elements of the convex closure can be obtained as limit points 
if the limits N f 0, ||/i||oo I are allowed to be taken jointly (Exercise: 
Prove that this is true in the Curie- Weiss model!). 

Of course, in the same way we define the tilted induced measures, 
and the main aim is to construct, in a more or less explicit way, the 
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set of limiting induced measures. We denote these sets by Cp[u], and 

Cjj[uj], respectively. The techniques used will basically of large devi- 
ation type, with some modifications necessary. We will discuss this 
formalism briefly in Section 3 and 5. 

2.5 Convergence and propagation of chaos. 

Here we would like to discuss a little bit the expected or possible 
behaviour of generalized random mean field models. Our first remark 
is that all the sets Cp[oj] and Cp[tJ\ will not be empty if S is compact. 
The same holds in most cases for C^[u] and C^[oj], namely when the 
image of S N under £jy M is compact. This may, however, be misleading. 
Convergence of a sequence of measures Q/3,n,m(n) on (R 00 ,B(R 00 )) in 
the usual weak sense means simply convergence of all finite dimen- 
sional marginals. Now take the sequence S c m(n) , of Dirac-measures 
concentrated on the M(iV)-th unit vector in M°°. Clearly, this sequence 
converges to the Dirac measure concentrated on zero, and this observa- 
tion obviously misses a crucial point about this sequence. Considered 
rather as a measure on the set of unit vectors, this sequence clearly 
does not converge. For most purposes it thus more appropriate to use 
a £ 2 -topology rather than the more conventional product topology. In 
this sense, the above sequence of Dirac measures does, of course, not 
converge weakly, but converges vaguely to the zero measure. 

It is an interesting question whether one can expect, in a random 
situation, that there exist subsequences of untilted measures converging 
weakly in the £ 2 topology in a phase transition region. Ch. Kiilske [Ku] 
recently constructed an example in which the answer to this question 
is negative. He also showed, that, as long as M(N) < In AT, in the 
standard Hopfield model, the sets [u] and [oj] coincide for almost 
all oj. 

In conventional mean field models, the induced measures converge 
(if properly arranged) to Dirac measures, implying that in the ther- 
modynamic limit, the macroscopic order parameters verify a law of 
large numbers. In the case of infinitely many order parameters, this is 
not obviously true, and it may not even seem reasonable to expect, if 
M(N) is not considerably smaller than N. Indeed, it has been shown 
in [BGP1] that in the Hopfield model this holds if | 0. Another 

paradigm of mean field theory is propagation of chaos [Sn], i.e. the fact 
that the (extremal) limiting Gibbs measures are product measures, i.e. 
that any finite subset of spins forms a family of independent random 
variables in the thermodynamic limit. In fact, both historically and in 
most standard textbooks on statistical mechanics, this is the starting 
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assumption for the derivation of mean field theory, while models such 
as the Curie- Weiss model are just convenient examples where these as- 
sumptions happen to be verified. In the situation of random models, 
this is a rather subtle issue, and we will come back to this in Section 8 
where we will learn actually a lot about this. 

2.6 Examples. 

Before turning to the study of large deviation techniques, we con- 
clude this section by presenting a list of commonly used variants of the 
Hopfield model and to show how they fit into the above framework. 

2.6.1 The standard Hopfield model. 

Here S = {— 1, 1}, q is the Bernoulli measure q(l) = q(—l) = \. 
(S N )* may be identified with WL N and £jy M are real M x iV-matrices. 
The mean field potential is Em{™) = §H TO I|2> where || • H2 denotes the 
2-norm in M, M . The measure P is such that £f are independent and 
identically distributed with P[£f = ±1] = |. The order parameter is 
the M-dimensional vector 

1 N 

m N,M [w] (°0 = ^ £ ( 2 - 14 ) 

i=l 

and the Hamiltonian results as the one in (2.3). 

2.6.2 Multi-neuron interactions. 

This model was apparently introduced by Peretto and Niez [PN] 
and studied for instance by Newman [N] . Here all is the same as in the 
previous case, except that the mean field potential is Em{tti) = ^\\m\\^, 
p > 2. For (even) integer p, the Hamiltonian is then 

1 M 

H N , M [u]{o) = -— *i (2-15) 

i 1 ,...,i p /j=1 

2.6.3 Biased Hopfield model. 

Everything the same as in 2.6.1, but the distribution of £f is sup- 
posed to reflect an asymmetry (bias) between +1 and —1 (e.g. to store 
pictures that are typically more black than white). That is, we have 
(e.g.) P[£f = 2x) = (1 - x) and P[£f = 2(1 - x)] = x. One may, of 
course, consider the model with yet different distributions of the £-\ 

2.6.4 Hopfield model with correlated patterns. 
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In the same context, also the assumption of independence of the £f 
is not always reasonable and may be dropped. One speaks of semantic 
correlation, if the components of each vector ^ are independent, while 
the different vectors are correlated, and of spatial correlation, if the 
different vectors £^ are independent, but have correlated components 
. Various reasons for considering such types of patterns can be found 
in the literature [FZ,Mi]. Other types of correlation considered include 
the case where P is the distribution of a family of Gibbs random fields 
[SW]. 

2.6.5 Potts-Hopfield model. 

Here the space S is the set {1, 2, . . . for some integer p, and q 
is the uniform measure on this set. We again have random patterns 
that are independent and the marginal distribution of P coincides with 
q. The order parameters are defined as 



1 N 

™;hw = ^e[^-J] ( 2 - 16 ) 



for n = 1, . . . , M. Em is the same as in the standard Hopfield model. 
Note that the definition of tum seems not to fit exactly our setting. 
The reader should figure out how this can be fixed. See also [Gl]. A 
number of other interesting variants of the model really lie outside our 
setting. We mention two of them: 

2.6.6 The dilute HopGeld model. 

Here we are in the same setting as in the standard Hopfield model, 
except that the Hamiltonian is no longer a function of the order parame- 
ter. Instead, we need another family of, let us say independent, random 
variables, J^, with GNxN with distribution e.g. P[Jy = 1] = x, 
P[Jij = 0] = 1 — x, and the Hamiltonian is 



1 M 

h n , m ["]{°) = -^E^-hE^? ( 2 - 17 ) 

This model describes a neural network in which each neuron interacts 
only with a fraction x of the other neurons, with the set of a priori 
connections between neuron described as a random graph [BG1,BG2]. 
This is certainly a more realistic assumption when one is modelling 
biological neural networks like the brain of a rat. The point here is that, 
while this model is not a generalized mean field model, if we replace the 
Hamiltonian (2.17) by its average with respect to the random variables 
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J, we get back the original Hopfield Hamiltonian. On the other hand, 
it is true that 



sup \H N:M [u](a) -E[H NM [u](a)\^]] < cN J — (2.18) 
aeS N v xiy 

with overwhelming probability, which implies that in most respects the 
dilute model has the same behaviour as the normal one, provide ^ is 
small. The estimate (2.18) has been proven first in [BG2], but a much 
simpler proof can be found in [T4] . 

2.6.7 The Kac-HopGeld model. 

This model looks similar to the previous one, but here some non- 
random geometry is introduced. The set {1, . . . , N} is replaced by 
A C Z d , and the random Jij by some deterministic function J 1 {i — j) = 
7 d J(7(z — j)) with J(x) some function with bounded support (or rapid 
decay) whose integral equals one. Here 7 is a small parameter. This 
model had already been introduced by Figotin and Pastur [FP3] but 
has been investigated more thoroughly only recently [BGP2, BGP4]. 
It shows very interesting features and an entire article in this volume 
is devoted to it. 
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3. Large deviation estimates and transfer principle 



The basic tools to study the models we are interested in are large 
deviation estimates for the induced measures Q/3,n,m- Compared to 
the standard situations, there are two particularities in the setting of 
generalized random mean field models that require some special atten- 
tion: (i) the dimension M of the space on which these measures are 
defined must be allowed to depend on the basic large parameter N and 
(ii) the measure Q/3,n,m is itself random. A further aspect is maybe 
even more important. We should be able to compute, in a more or 
less explicit form, the "rate function" , or at least be able to identify its 
minima. In the setting we are in, this is a difficult task, and we will 
stress the calculational aspects here. We should mention that in the 
particular case of the Hopfield model with quadratic interaction, there 
is a convenient trick, called the Hubbard- Stratonovich transformation 
[HS] that allows one to circumvent the technicalities we discuss here. 
This trick has been used frequently in the past, and we shall come back 
to it in Section 8. The techniques we present here work in much more 
generality and give essentially equivalent results. The central result 
that will be used later is Theorem 3.5. 

3.1. Large deviations estimates. 

Let us start with the general large deviation framework adopted to 
our setting. Let M and iV be two integers. Given a family {un, N > 1} 
of probability measures on (R M , i3(M M )), and a function E M - M M — > K 
(hypotheses on Em will be specified later on), we define a new family 
{Vn, N > 1} of probability measures on (R M ,B(R M )) via 

r NE m (x)j ( \ 

We are interested in the large deviation properties of this new fam- 
ily. In the case when M is a fixed integer, it follows from Varadhan's 
lemma on the asymptotics of integrals that, if {z^v, N > 1} satisfies 
a large deviation principle with good rate function /(•), and if Em is 
suitably chosen (we refer to [DS], Theorem 2.1.10 and exercise (2.1.24) 
for a detailed presentation of these results in a more general setting) 
then {//at, N > 1} satisfies a large deviation principle with good rate 
function J(x) where 

J(x) = -\E M {x) - I{x)\ + sup [E M (y) - I(y)} (3.2) 

y eR M 



18 



Here we address the question of the large deviation behaviour of 
{/Uat, N > 1} in the case where M = M(N) is an unbounded func- 
tion of N and where the measure un is defined as follows: 

Let £ be a linear transformation from R N to R M . To avoid com- 
plications, we assume that M < N and £ is non-degenerate, i.e. its 
image is all R M . We will use the same symbol to denote the corre- 
sponding N x M matrix £ = {ft M }i=i n-^=i m and we will denote 
by = E M M , respectively ft = '(#,. E R N , the 

(U-th row vector and i-th column vector. The transposed matrix (and 
the corresponding adjoint linear transformation from R M to R N ) is 
denoted £ T . Consider a probability space (R,B(R),V) and its AT-fold 
power (R N ,V N ) where P N = V® N . We set 

1/n = Vn o{^ t Y 1 (3.3) 

In this subsection we will present upper and lower large deviation 
bounds for fixed N. More precisely we set, for any p > and x* G M M , 

ZnA**) = I e NE ^dv N {x) (3.4) 

In the regime where limAr^oo ^ = 0, estimates on these quantities 
provide a starting point to prove a strong large deviation principle for 
{Pn-, N > 1} in a formulation that extends the "classical" Cramer's 
formulation. This was done in [BG4] in the case of the standard Hop- 
field model. In the regime where limjv^oo ^ = a with a > 0, we 
cannot anymore establish such a LDP. But estimates on Zn :P (x*) will 
be used to establish concentration properties for Qn asymptotically as 
N tends to infinity, as we will see later in the paper. 

Following the classical procedure, we obtain an upper bound on 
Zn,p(x*) by optimizing on a family of exponential Markov inequalities. 
As is well known, this will require the computation of the conjugate 
of 4 the logarithmic moment generating function, defined as 

C N , M (t) = 4 lo S / e N ^u N (dx) ,teR M (3.5) 

iV J R M 



We have chosen to follow Rockafellar's terminology and speak about conjugacy 
correspondence and conjugate of a (convex) function instead of Legendre-Fenchel 
conjugate, as is often done. This will allow us to refer to [Ro] and the classical 
Legendre transform avoiding confusions that might otherwise arise. 
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In the setting we are in, the computation of this quantity is generally 
quite feasible. A recurrent theme in large deviation theory is that of 
the Legendre transform. To avoid complications that will not arise in 
our examples, we restrict the following discussion mainly to the case 
when the Legendre transform is well defined (and involutary) which 
is essentially the case where the convex function is strictly convex and 
essentially smooth. We recall from [Ro]: 

Definition 3.1. A real valued function g on a convex set C is said to 
be strictly convex on C if 

g((l - X)x + Xy) < (1 - X)g(x) + Xg(y) < A < 1 (3.6) 

for any two different points x and y in C . It it called proper if it is not 
identically equal to +oo. 

An extended-real-valued function h on R M is essentially smooth if it 
satisfies the following three conditions for C = int(domh) : 

(a) C is non empty; 

(b) h is differ entiable throughout C; 

(c) \\m. i ^ 00 \Vh{xi)\ = +oo whenever xi, X2, ■■■ , is a sequence in C 
converging to a boundary point x of C. 

(Recall that domg = {x E 1R M | g{x) < oo}). Note that if a 
function Em is essentially smooth, it follows (c.f. [RV], Theorem A 
and B and [Ro], pp. 263-272) that Em attains a minimum value and 
the set on which this (global) minimum is attained consists of a single 
point belonging to the interior of it's domain. Without loss of generality 
we will assume in the sequel that Em{x) > and Em{0) = 0. 

All through this chapter we adopt the usual approach that consists 
in identifying a convex function g on domg with the convex function 
defined throughout the space R M by setting g(x) = +oo for x ^ dom.g. 

Definition 3.2. Let g be a proper convex function. The function g* 
defined by 

g*{x*)= sup {(x,x*)-g(x)} (3.7) 

x£]R M 

is called its (ordinary) conjugate. 

For any set S in IR M we denote by int.!? its interior. For smooth 
g we denote by Wg(x) = . . . , . . . , , V 2 ^) = 
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( L^dx" I and A 9(x) = Eili respectively the gradient 

V / /j,,v—1,...,N ^ 

vector, the Hessian matrix, and the Laplacian of g at x. 

The following lemma collects some well-known properties of £n,m 
and its conjugate: 

Lemma 3.3. 

(a) £n,m o,nd C* N M are proper convex functions from 1R M to lUoo. 

(b) CN,M(t) is infinitely differentiate on 

int(dom£/v,M)- Defining the measure i>N,t via d^N,t(X) = 
dv]sr(X), and denoting by E t (-), the expectation 



exp {N(t,X)} 



(3.8) 



,M 



J exp{N(t,X)}dv N (X) 

w.r.t. z>7v )t we have, for any t in dom£jv,M, 

VC N , M (t)=E t (X)= (MX,,)) 

\ //i=i,..., 

^V 2 £at,m(*) = (it(X„X„) -E t (X M )E t (X,)) 

and, if C* is smooth, the following three conditions on x are equivalent 

1) VC NM (t)=x 

2) C* N ^ M (x) = (t,x)-jC N , M (t) (3.9) 

3) (y, x) — Cn,m(v) achieves its supremum over y at y = t 

(c) C* NjM (x) > and, ifE (X) < oo, C* NM (E (X)) = 0. 

Proof. The proofs of statements (a) and (c) can be found in [DZ], as 
well as the proof of the differentiability property. The formulae (3.8) 
are simple algebra. Finally, the equivalence of the three conditions 
(3.9) is an application of Theorem 23.5 of [Ro] to the particular case of 
a differentiable proper convex function. 



Setting 

®n,m(x) = -E M (x) + C* NM {x) , x G M M (3.10) 

we have 

Lemma 3.4. For any x* in R M , define t* = t*(x*) through 
£-n,m( x *) = (t*i x *) ~ £>N,M(t*) if such a t* exists while otherwise 
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||t*|| 2 = oo (note that t* need not be unique). We have, for any p > 0, 

-J- logZ N , p (x*) < -* NtM (x*) + sup [E M (x) - E M (x*)} + p\\t*\\ 2 
iV xeB p (x*) 

(3.11) 

and 

±-\ogZ N , p (x*) > -y N , M (**) + inf [E M (x)-E M (x*)]-p\\t*\\ 2 

iV xEBp(x*) 

+ ^log(l-^A£ JV)M (0) 

(3.12) 

Proof. Analogous bounds were obtained in [BG4], Lemmata 2.1 and 
2.2, in the special case of an application to the Hopfield model. The 
proofs of (3.11) and (3.12) follow the proofs of these lemmata with only 
minor modifications. We will only recall the main lines of the proof of 
the lower bound: the essential step is to perform an exponential change 
of measure i.e., with the definition of z>jv,t from Lemma 3.4, we have, 

±]ogZ NtP (z*) =%? (e^ E -W-^^>I {Bp(x * )} )E ( e "<*-.^ 

(3.13) 

from which, together with (3.5) and (3.9), we easily obtain, 

— \ogZ Np {x*) > e N {~^ N - M ( x *^ +[n ^eB p ( x *)l E M(^)-EM(x*)]-p\\t*\\ 2 } 

x i> N7 r(B p (x*)) 

(3.14) 

When the law of large numbers is not available, as is the case here, the 
usual procedure to estimate the term VN,t*(B p (x*)) would be to use 
the upper bound. Here we simply use the Tchebychev inequality to 
write 

1 - »N,t* (B p (x*)) = E t . (l { ||x-,* |||> p2} ) < jtt t * \\X - x* ||1 (3.15) 

Now, by (3.9), t* satisfies V£at 5 m(£*) = x* > and it follows from (3.8) 
that 

i M r / \2i 

14.11^-^111 = "a E E t *Xl-(E t *X^ =-^AC N , M (t*) 

(3.16) 

Collecting (3.14), (3.15) and (3.16) proves (3.12). 
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Remark. The lower bound (3.12) is meaningful only if 
A£m,n(x) < 1- But the Laplacian of a function on 1R M has a 
tendency to be of order M. Thus, typically, the lower bound will be 
useful only if p 2 > 0(M/N). We see that if liia moo § = 0, one may 
shrink p to and get upper and lower bounds that are asymptotically 
the same (provided Em is continuous), provided the norm of t* remains 
bounded. Since t* is random, this introduces some subtleties which, 
however, can be handled (see [BG4]). But if lirn/vtoo ^ = a > 0, we do 
not get a lower bound for balls of radius smaller than 0{y/a) and there 
is no hope to get a large deviation principle in the usual sense from 
Lemma 3.4. What is more disturbing, is the fact that the quantities 
^> and t* are more or less impossible to compute in an explicit form, 
and this makes Lemma 3.4 not a very good starting point for further 
investigations. 

3.2. Transfer principle. 

As we will show now, it is possible to get large deviation estimates 
that do not involve the computation of Legendre transforms. The price 
to pay will be that these will not be sharp everywhere. But as we 
will see, they are sharp at the locations of the extrema and thus are 
sufficient for many purposes. Let us define the function 

&n,m( x ) = ~Em(x) + (x, VE M (x)) - £ N , M (VE M (x)) (3.17) 
Theorem 3.5. 

(i) Let x* be a point in R M such that for some po > 0, for all x,x' G 
B po (x*), \\VE M (x) - VE M {x')h < c\\x - x'\\ 2 . Then, for all < 
P < Po 

±\ogZ N , p (x*) < -$ N , M (x*) + ^cp 2 (3.18) 

(ii) Let x* be a point such that V 'Cn,m(S 'Em(x*)) = x* . Then, 

(3.19) 

Remark. The condition V£n,m(VEm(x*)) = x* is equivalent to the 
condition n,m(x*) = 0, if C* is essentially smooth. This means that 
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the lower bound holds at all critical points of the "true" rate function. 
It is easy to see that V\l/ n,m(x) = implies V&n,m(x) = 0, while the 
converse is not generally true. Fortunately, however, this is true for 
critical points of &n,m that are minima. This fact will be established 
in the remainder of this section. 

Remark. It is clear that we could get an upper bound with error 
term C p without the hypothesis that VEm is Lipshitz. However, when 
we apply Theorem 3.5, a good estimate on the error will be important 5 , 
while local Lipshitz bounds on VEm are readily available. 

Proof. With the definition of v^,t from Lemma 3.4, we have, 
W) = E* (e N{EM{X) - {t > X)} l {Bp(x * )} ) Eo (e"<*'*>) 



_ e N{C N>M {t)+E M {x*)-{t,x*)} 



(3.20) 



The strategy is now to chose t in such a way as to get optimal con- 
trol over the last exponent in (3.20). By the fundamental theorem of 
calculus, 



\E M (X)-E M (x*)-(t, (X-x*))\ 

= [ ds((VE M {sX+ (l-s)x*) -t),(X-x*)) 
Jo 

< sup \\(VE M (sX + (l-s)x*) -t\\ 2 \\X -x*\\ 2 
se[o,i] 



(3.21) 



Of course we want a bound that is uniform in the set of X we consider, 
so that the best choice is of course t = VEm(x*). Since VEm(x) was 
assumed to be Lipshitz in B p (x*) we get 

Zn (X*) < e N { C N,M(VEM(x*))+EM(x*)-(VE M (x*),x'-)} e ±Ncp 2 

(3 22) 

_ p -N$> NM {x*)±Ncp 2 V ' ' 



where the last equality follows from the definition (3.17). This proves 
the upper bound (3.18). To prove the lower bound, note that since Em 



5 The point is that the number of balls of radius p to cover, say, the unit ball is 
of the order p~ aN , that is exponentially large. Therefore we want to use as large a 
p as possible with as small an error as possible. Such problems do not occur when 
the dimension of the space is independent of N. 
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is convex, 

E M (X) - E M (x*) - (VE M (x*), (X - x*)) > (3.23) 

Using this in the last factor of (3.20), we get 

Zn, p (x*) > e- N ^»'™^»i> N , t (B p (x*)) (3.24) 

Now, just as in (3.15), 

l-u N , t (B p (x*)) < ±Et\\X-x*\\l (3.25) 

and a simple calculation as in Section 3.1 shows that 

E t \\X - x*\\l = ^A£ N: M(t) + \\VC N , M (t) - x*\\l (3.26) 

Here we see that the optimal choice for t would be the solution of 
V£tv j m(^) = x* , an equation we did not like before. However, we 
now have by assumption, V Cn,m(^ Em(x*)) = x* . This concludes the 
proof of Theorem 3.14. 



Sometimes the estimates on the probabilities of ^2-balls may not 
be the most suitable ones. A charming feature of the upper bound is 
that it can also be extended to sets that are adapted to the function 
Em- Namely, if we define 

ZnA**) = J e Ari5M(x) I { ||v EM (x)-v EM (x*)|| 2 <p}^(a:) (3.27) 
we get 

Theorem 3.6. Assume that for some q < 1 for all y,y' G 
B po (VE M (x*)), UVEm)- 1 ^) - {VE M )-\y')h < ch-y'Wl then 
for all < p < po 

^\ogZ N , p (x*) < -$ N , M (x*) + \w 1+q (3-28) 

The proof of this Theorem is a simple rerun of that of the upper 
bound in Theorem 3.5 and is left to the reader. 

We now want to make the remark following Theorem 3.5 precise. 
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Proposition 3.7. Assume that Em is strictly convex, and essentially 
smooth. If $at.m has a local extremum at a point x* in the interior of 
its domain, then V X(V 'Em(x*)) = x* . 

Proof. To prove this proposition, we recall a fundamental Theorem 
on functions of Legendre type from [Ro] . 

Definition 3.8. Let h be a differentiable real-valued function on a open 
subset C of~R M . The Legendre conjugate of the pair (C, h) is defined to 
be the pair (D,g) where D = V/i(C) and g is the function on D given 
by the formula 

g(x*) = ((Vh)- 1 (x*),x*)-h((Vh)- 1 (x*)) (3.29) 

Passing from (C, /i) to (D,g), if the latter is well defined, is called the 
Legendre transformation. 

Definition 3.9. Let C be an open convex set and h an essentially 
smooth and strictly convex function on C . The pair (C, h) will be called 
a convex function of Legendre type. 

The Legendre conjugate of a convex function of Legendre type is 
related to the ordinary conjugate as follows: 

Theorem 3.10. ([Ro], Theorem 26.5) Let h be a closed convex func- 
tion. Let C = int(domh) and C* = int(domh*) . Then (C, h) is a convex 
function of Legendre type if and only if (C*, h*) is a convex function 
of Legendre type. When these conditions hold, (C*,h*) is the Legen- 
dre conjugate of (C, h), and (C, h) is in turn the Legendre conjugate of 
(C*, h*). The gradient mapping is then one-to-one from the open con- 
vex set C onto the open convex set C* , continuous in both directions 
and 

V/i* = (V/i) -1 (3.30) 

With this tool at our hands, let us define the function iPn,m(x) = 
Em( x ) ~ £n,m(%)- The crucial point is that since Em is of Legendre 
type, by Definition 3.8 and Theorem 3.10, we get 

$n,m(x) =^n,m(VE m (x)) (3.31) 

Moreover, since VEm is one-to-one and continuous, $at,m has a lo- 
cal extremum at x* if and only if ^n,m has a local extremum at 
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the point y* = VEm(x*). In particular, VV , at,m(2/*) = 0. Thus, 
VE* M {y*) = V£ N , M (y*), and by (3.30), (V£m)"'V) = VC N , M (y*), 
or x* = V£at ) m(V-Em(^*), which was to be proven. 



The proposition asserts that at the minima of $, the condition of 
part (ii) of Theorem 3.5 is satisfied. Therefore, if we are interested in 
establishing localization properties of our measures, we only need to 
compute $ and work with it as if it was the true rate function. This 
will greatly simplify the analysis in the models we are interested in. 
Remark. If C is of Legendre type, it follows by the same type of argu- 
ment that x* is a critical point of ^ if and only if VEm(x*) is a critical 
point of ip. Moreover, at such critical points, <&(x*) = iJ)(VEm(x*))- 
Thus in this situation, if x is cl critical point of than x* is a crit- 
ical point of and ^(x*) = 3>(x*). Conversely, by Proposition 3.7, 
if $ has a local extremum at x* , then x* is a critical point of ^> and 
3>(x*) = ^(x*). Since generally *&(x) > ®(x), this implies also that 
if $ has a minimum at x*, then \1/ has a minimum at x* . One can 
build on the above observations and establish a more complete "dual- 
ity principle" between the functions $ and \1/ in great generality, but 
we will not make use of these observations. The interested reader will 
find details in [G2]. 
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4. Bounds on the norm of random matrices 



One of the crucial observations that triggered the recent progress 
in the Hopfield model was the observation that the properties of the 
random matrix A (N) = play a crucial role in this model, and that 
their main feature is that as long as M/N is small, A(N) is close to 
the identity matrix. This observation in a sense provided the proper 
notion for the intuitive feeling that in this case, "all patterns are al- 
most orthogonal to each other" . Credit must go to both Koch [K] and 
Shcherbina and Tirozzi [TS] for making this observation, although the 
properties of the matrices A (N) had been known a long time before. 
In fact it is known that under the hypothesis that are independent, 
identically distributed random variables with E£" = 0, E [£"] = 1 and 
E [£f ] 4 < oo, the maximal and minimal eigenvalues of A(N) satisfy 

lim \ max (A{N)) = (1 + v 7 ") 2 , a.s. u i) 

N\oo v ' 

This statement was proven in [YBK] under the above (optimal) hy- 
potheses. For prior results under stronger assumptions, see [Ge,Si,Gi]. 
Such results are generally proven by tedious combinatorial methods, 
combined with truncation techniques. Estimates for deviations that 
were available from such methods give only subexponential estimates; 
the best bounds known until recently, to our knowledge, were due to 
Shcherbina and Tirozzi [ST] and gave, in the case where £f are sym- 
metric Bernoulli random variables 



/ 4/3 M 2/3 \ 

P [\\A(N) - 1|| > [(1 + v^) 2 -!](! + e)] < exp [ —J 



(4.2) 



with K a numerical constant and valid for small e. More recently, a 

bound of the form exp ^— ^-^j was proven by the authors in [BG5], 

using a concentration estimate due to Talagrand. In [T4] a simplified 
version of that proof is given. We will now give the simplest proof of 
such a result we can think of. 

Let us define for a M x M-matrix A the norm 

||A|| = sup (x,Ax) (4.3) 

IMI 2 =i 

For positive symmetric matrices it is clear that ||A|| is the maximal 
eigenvalue of A. We shall also use the notation ||A|| 2 = a/X^^ 
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Theorem 4.1. Assume that E£f = 0, E[£f] 2 = 1 and |£f| < 1. T/ien 
t/iere exists a numerical constant K such that for large enough N , the 
following holds for all e > and all a > 

P[|PW||-(1 + V^) 2 |>6] 



Proof. Let us define for the rectangular matrix £ 

||e||+= sup Ux\\ 2 (4.5) 

|M| 2 =1 

Clearly 

PWIHIlC/ViVll 2 , (4.6) 



Motivated by this remark we show first that ||£/viV||+ has nice con- 
centration properties. For this we will use the following theorem due 
to Talagrand: 

Theorem 4.2. (Theorem 6.6 in [T2]) Let f be a real valued function 
defined on [—1, 1]^. Assume that for each real number a, the set {/ < 
a} is convex. Suppose that on a convex set B C [—1,1]^ the restriction 
of f to B satisfies for all x,y G B 

\f{x)-f{y)\<l B \\x-yh (4-7) 

for some constant Ib > 0. Let h denote the random variable h = 
f(Xi, . . . , Xn) . Then, if Mf is a median of h, for all t > 0, 



F[\h-M f \>t]<4b +T ^- b e W (- 



where b denotes the probability of the complement of the set B. 



(4.8) 



To make use of this theorem, we show first that ||£/viV|| + is a 
Lipshitz function of the i.i.d. variables £f : 

Lemma 4.3. For any two matrices we have that 

\+-W\\+\<U-?h (4-9) 
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Proof. We have 



iiieii + -rn + i< sup m^ib-ii^ibi 

Il^ll2 = 1 

< sup Ux-£,'x\\ 2 



Ikll2 = l 



< sup 



II ^ II 2 = 



N N M 



£^££(e-^) 2 = ii^'ii 2 



(4-10) 

where in the first inequality we used that the modulus of the difference 
of suprema is bounded by the supremum of the modulus of the dif- 
ferences, the second follows from the triangle inequality and the third 
from the Schwarz inequality. 



Next, note that as a function of the variables £ G [—1, l] MAr ? ||£|| + 
is convex. Thus, by Theorem 4.2, it follows that for all t > 0, 



P 



U/VN\\ + -M u/VNl 



> t 



(4.11) 



where My^^y is a median of ||£/viV|| + . Knowing that ||A(iV)|| 
converges almost surely to the values given in (4.1) we may without 
harm replace the median by this value. Thus 



P[||A(Ar)|| + -(l + v ^) 2 > e ] 
= P 



U/VN\\+ - (1 + y/a) > (1 + y/a) (1+ ^ )2 - 
< 4 exp (-N(l + Vaf (^+(7^ " ' /") 



- 1 



(4.12) 
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and similarly, for < e < (1 + \fa) 2 

P[||A(iV)|| + -(l + v ^) 2 <-e] 
= P 



M/VNW+ - (1 + Va) < (1 + Va) ^l- (1+ £ v - )2 - l) 
< 4exp (-N(l + v^) 2 (^Z 1 " (i 



-1 /16 



(4.13) 

while trivially P [||A(JV)II + - (1 + v^) 2 < -e] = for e > (1 + ^faf. 
Using that for < x < 1, (v 7 ! 77 ^ 7 - l) 2 > - l) 2 , we get 

Theorem 4.1. 



Remark. Instead of using the almost sure results (4.1), it would 
also be enough to use estimates on the expectation of ||A(iV)|| to prove 
Theorem 4.1. We see that the proof required no computation whatso- 
ever; it uses however that we know the medians or expectations. The 
boundedness condition on arises from the conditions in Talagrand's 
Theorem. It is likely that these could be relaxed. 

Remark. In the sequel of the paper we will always assume that our 
general assumptions on £ are such that Theorem 4.1 holds. Of course, 
since exponential bounds are mostly not really necessary, one may also 
get away in more general situations. On the other hand, we shall see 
in Section 6 that unbounded £f cause other problems as well. 
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5. Properties of the induced measures 



In this section we collect the general results on the localization (or 
concentration) of the induced measures in dependence on properties of 
the function $/3,at,m introduced in the previous section. There are two 
parts to this. Our first theorem will be a rather simple generalization 
to what could be called the "Laplace method". It states, roughly, 
the (hardly surprising) fact that the Gibbs measures are concentrated 
"near" the absolute minima of <E>. A second, and less trivial remark 
states that quite generally, the Gibbs measures "respect the symmetry 
of the law of the disorder" . We will make precise what that means. 

5.1 Localization of the induced measures. 

The following Theorem will tell us what we need to know about 
the function <E> in order to locate the support of the limiting measures 

Q. 

Theorem 5.1. Let A C K°° be a set such that for all N sufficiently 
large the following holds: 

(i) There is n G A such that for all m G A c , 

®f3,N,M{N) M (m) - ®/3,N,M(N) M (n) > Ca (5.1) 

for C > c sufficiently large, with c the constant from (i) of Theorem 
3.5. 

(ii) AjCN,M(VEM(n)) < KM for some K < oo, and B K ^(n) C A. 
Assume further that $ satisfies a tightness condition, i.e. there 
exists a constant, a, sufficiently small (depending on C), such that 
for all r > Ca 

H{m |^, M ,ArM(m) - ^,m,jvM (n) < r}) < r M / 2 a M M~ M / 2 

(5.2) 

where £(•) denotes the Lebesgue measure. Then there is L > such 
that 

Qp,N,M(N)[oo](A c )<e- Lm (5.3) 

and in particular 

KmQf) jNjM(N) [u](A) = l (5.4) 

iV | oo 

Remark. Condition (5.2) is verified, e.g. if $ is bounded from below 
by a quadratic function. 
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Proof. To simplify notation, we put w.r.g. $>p,N,M[oj](n) = 0. Note 
first that by (ii) and (3.19) we have that (for suitably chosen p) 

Q,, N , M{N) [u] {A) > me-^.-HW = ^-J: 

(5.5) 

It remains to show that the remainder has much smaller mass. Note 
that obviously, by (i), 

poo 

Qf3,N,M(N) [w] (A C ) < / drQ^ jNjM (N) M (A° n {m \$p,N,M( m ) = r l) 

JCa 
poo 

< / drQp jNtM (N)[u] ({m \®p,N,M(m) = r l) 

(5.6) 

Now we introduce a lattice Wm,« of spacing 1/y/N in M M . The point 
here is that any domain D C M M is covered by the union of balls 
of radius centered at the lattice points in D, while the number of 
lattice points in any reasonably regular set D is smaller than £(D)N M / 2 
(see e.g. [BG5] for more details). Combining this observation with the 
upper bound (3.18), we get from (5.6) that 

Zp,N,M(N)[w]Qp,N,M(N)[w] 
poo 

< / dre-P Nr £ ({m |$ M , M H < r}) N M l 2 eP Mc ' 2 

J Cot 
poo 

< / dre-< 3Nr r M / 2 a M a- M / 2 e' 3ac ' 2 

JCa 

< a M e pMc ' 2 a f°° dre~^ Mr r M / 2 (5 ' 7) 

Jc 

pOO 

< a M e^ c / 2 e-? MC ' 2 a / dre^ Mr / 2 r M / 2 

Jc 

r ol M 

e -pM[C/2-c/2-\na/p] N -l _£_ 

e/3 

which clearly for (3 > 1 can be made exponentially small in M for C 
sufficiently large. Combined with (5.5) this proves (5.3). (5.4) follows 
by a standard Borel-Cantelli argument. 



Remark. We see at this point why it was important to get the error 
terms of order p 2 in the upper bound of Theorem 3.5; this allows us to 
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choose p ~ \fot. otherwise, e.g. when we are in a situation where we 
want use Theorem 5.6, we could of course choose p to be some higher 
power of a, e.g. p = a. This then introduces an extra factor e M l lna l, 
which can be offset only by choosing C ~ | lna|, which of course implies 
slightly worse estimates on the sets where Q is localized. 

5.2 Symmetry and concentration of measure. 

Theorem 5.1 allows us to localize the measure near the "reasonable 
candidates" for the absolute minima of <E>. As we will see, frequently, 
and in particular in the most interesting situation where we expect a 
phase transition, the smallest set A satisfying the hypothesis of Theo- 
rem 5.1 we can find will still be a union of disjoint sets. The components 
of this set are typically linked by "symmetry" . In such a situation we 
would like to be able to compare the exact mass of the individual com- 
ponents, a task that goes beyond the possibilities of the explicit large 
deviation estimates. It is the idea of concentration of measure that 
allows us to make use of the symmetry of the distribution P here. This 
fact was first noted in [BGP3], and a more elegant proof in the Hopfield 
model that made use of the Hubbard- Stratonovich transformation was 
given first in [BG5] and independently in [T4]. 

Here we give a very simple proof that works in more general situa- 
tions. The basic problem we are facing is the following. Suppose we are 
in a situation where the set A from Theorem 5.1 can be decomposed 
as A = UkAk for some collection of disjoint sets Ak- Define 

/"M(A0 = -^lnE,e-»^HW I{m;fMH(ffMi} (5 . 8) 

Assume that by for all k 

E/ jV [a;](A;) = E/ JV [a;](l) (5.9) 

(Think of Ak = B p (m*e k ) in the standard Hopfield model). We want 
so show that this implies that for all fc, |/jv[w](/c) — /jv[o;](1)| is "small" 
with large probability. Of course we should show this by proving that 
each fff[u)](k) is close to its mean, and such a result is typically given 
by concentration estimates. To prove this would be easy, if it were 
not for the indicator function in (5.8), whose argument depends on the 
random parameter u> as well as the Hamiltonian. Our strategy will be 
to introduce quantities f%{k) that are close to fN(k), and for which it 
is easy to prove the concentration estimates. We will then control the 
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difference between ff^(k) and We set 

(5-10) 

Note that the idea is that e^H™^^-™^ converges to 

the Dirac distribution concentrated on m,N,M(o'), so that f%{k) con- 
verges to /jv(fc) as e | 0. Of course we will have to be a bit more careful 
than just that. However, Talagrand's Theorem 6.6 of [T2] gives readily 

Proposition 5.2. Assume that £ verifies the assumptions of Theorem 
4-1 and S is compact. Then there is a finite universal constant C such 
that for all e > 0, 

F [\f N (k) - Ef N (k)\ >x]< Ce~ M + Ce-^ (5.11) 



Proof. We must establish a Lipshitz bound for f^[uj](k). For mu- 
tational simplicity we drop the superfluous indices iV and k and set 

f^} = r N [uj}(k). Now 



0N 
1 



In 



E CT j dme~^\\ mN - M ^ a ^~ m ^e l3NEM ( m ^> 



E a J dme~^^ mN ' M ^'^~ m ^e l3NEM ^' m ) 



(3N 



E a J dm e~ ^^ mN ' M ^'^ a ~^~ m ^e l3NEM< - r 



In 



E a f dme~^^ mN ' M ^'^~ m ^el 3NEM ^ 

^(\\rn NiM [u](v)-rn\\l-\\rn NiM [u'](cT)-rn\\l) 



1 

< - 



sup |||toat,mM((t) - m\\ 2 - \\m NjM [u'](o-) -m\\ 2 \ 

e aeS N ,mEA k 



(5.12) 



But 



\\\m N:M [u](o-) - m\\l - \\m NM [u']{o-) - m\\ 2 2 \ 

< \\m N , M W]{o-) - m N , M [w]{ (J )h\\ 2 ' m ~ %,JfHW - "7,iV,MK](c r )||2 

1 



< 



N 



neK]-eMii 2 r+c(^a\u\\\ + Vu 



(5.13) 
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where R is a bound for m on A k . We wrote A[u] = * to make the 

dependence of the random matrices on the random parameter explicit. 
Note that this estimate is uniform in a and m. It is easy to see that 
f e [co] has convex level sets so that the assumptions of Theorem 6.6 of 
[Tl] are verified. Proposition 5.2 follows from here and the bounds on 
1 1 A [u] || given by Theorem 4.1. 



We see from Proposition 5.2 that we can choose an e = N~ Sl , and 
an x = N~ 52 with 81,82 > and still get a probability that decays 
faster than any power with N. 

Let us now see more precisely how f e [uj] and f°[u>] are related. Let 
us introduce as an intermediate step the e-smoothed measures 

Qf3,N, M M = Qp,nM"} (0, -^j (5.14) 

where J\f ^0, -^^j is a M-dimensional normal distribution with mean 

and variance 7^7 !• We mention that in the case Em(tti) = |||w||2, the 
choice e = 1 is particularly convenient. This convolution is then known 
as the "Hubbard- Stratonovich transformation" [HS]. Its use simplifies 
to some extent that particular case and has been used frequently, by us 
as well as other authors. It allows to avoid the complications of Section 
3 altogether. 

We set f e [uj] = —±j In (Zp, NM Q% N , M (AiS)- But 
(A k ) 

= E (J ^j M/2 [ dme-^ mN ' M{a) - m ^e l3NEM ^ mN - M{a)) 



A 



= E (£N\ M/2 f dmI -. xe -%f\\m*M<r)-rn\\ 
\2ne) / ulnsi -{\\m NM {cr)-m\\2<8} e 

J Ah 



(3NE M {m N , M {<T)) 



X e h 



M/2 

,/3NE M (m NiM (<T)) 



' a \ 2ne) / u '"'- u -{|l"iJV,M(o-)-m||2>(5} e 



x e h 
(I) + (H) 

(5.15) 
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for S > to be chosen. We will assume that on Ak, Em is uniformly 
Lipshitz for some constant Cl- Then 

(I) < e +f3NC ^E a (|^) M/2 y dme-^ m ^ M ^- m ^ NEM ^ 



e pNC L S e -pNr[oj] 



and 



(5.16) 



X 



dm e~^^ mN ' M ^~ rn ^e f3NEM ^ rnN ' M< - a ^ 

M/2 



(5.17) 



< 2 M/2 -M-5 2 E J}NE M {m NM (a)) ( §N\ 
— ' a \4-7re I 



In quite a similar way we can also get a lower bound on (/), namely 

M/2 



(I) > e-^ NC ' LS E a (0) [ dme-^ m wW- m MeP NE »W 

J Ak 

_ e -f3NC L 5 2 M/2 e -££5 2 e (3N su PmeAk E M {m) 



= e -l3NC L 5 e -l3Nr[u;} _ e ~l3NC L 5 2 M /2g- e & N su P-e^ fc e m(™) 

(5-18) 

Since we anticipate that e = N Sl , the second term in (5.18) is neg- 
ligible compared to the first, and (II) is negligible compared to (I), 
with room even to choose 5 tending to zero with AT; e.g., if we choose 
5 = e 1 / 4 , we get that 

\f e [u] - f e [u] \ < const.e 1 / 4 (5.19) 

for sufficiently small e. (We assume that |/ e [c<;]| < C). 

Finally we must argue that f e [oo] and f [oj] differ only by little. 
This follows since Af(0, — ) is sharply concentrated on a sphere of 
radius ^ (although this remark alone would be misleading). In fact, 
arguments quite similar to those that yield (5.19) (and that we will not 
reproduce here) give also 

l/ e M - /°MI < const.e 1/4 (5.20) 
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Combining these observations with Proposition 5.2 gives 

Theorem 5.3. Assume that £ verifies the assumptions of Theorem 4-1 
and S is compact. Assume that Ak C M M verifies 

Qf3,N,MM{A k )>e-P Nc (5.21) 

for some finite constant c, with probability greater than 1 — e~ M . Then 
there is a finite constant C such that for e > small enough for any 
k, I, 



P 



\f N (k) - f N (l) \ > Ce 1/4 + x] < Ce~ M + Ce'^ (5.22) 
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6. Global estimates on the free energy function 

After the rather general discussion in the last three sections, we 
see that all results on a specific model depend on the analysis of the 
(effective) rate function ^p^u^x). The main idea we want to follow 
here is to divide this analysis in two steps: 

(i) Study the average K^p j n[uj](x) and obtain explicit bounds from 
which the locations of the global minima can be read off. This part 
is typically identical to what we would have to do in the case of 
finitely many patterns. 

(ii) Prove that with large probability, I^^M^) — E$ j a ) Ar[a;](x)| is so 
small that the deterministic result from (i) holds essentially outside 
small balls around the locations of the minima for $>p jN [uj](x) itself. 
These results then suffice to use Theorems 5.1 and 5.3 in order to 

construct the limiting induced measures. The more precise analysis of 
$ close to the minima is of interest in its own right and will be discussed 
in the next section. 

We mention that this strict separation into two steps was not fol- 
lowed in [BG5]. However, it appears to be the most natural and rea- 
sonable procedure. Gentz [Gl] used this strategy in her proof of the 
central limit theorem, but only in the regime M 2 /N j 0. To get suf- 
ficiently good estimates when a > 0, a sharper analysis is required in 
part (ii). 

To get explicit results, we will from now on work in a more re- 
stricted class of examples that includes the Hopfield model. We will 
take S = { — 1, 1}, with <?(±1) = 1/2 and Em(tti) of the form 

E M (m) = -\\m\\ p p (6.1) 
p 

with p > 2 and we will only require of the variables £f to have mean 
zero, variance one and to be bounded. To simplify notation, we assume 
|Cf I < 1- We do not strive to get optimal estimates on constants in 
this generality, but provide all the tools necessary do so in any specific 
situation, if desired 5 . 



d A word of warning is due at this point. We will treat these generalized models 
assuming always M=aN. But from the memory point of view, these models should 
and do work with M=aN v ~ 1 (see e.g. [Ne] for a proof in the context of storage 
capacity). For p>2 our approach appears perfectly inadequate to deal with so many 
patterns, as the description of system in terms of so many variables (far more than 
the original spins!) seems quite absurd. Anyhow, there is some fun in these models 
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A simple calculation shows that the function of Theorem 3.5 defined 
in (3.17) in this case is given by (we make explicit reference to p and 
P, but drop the M) 

1 1 N ( M 

$ P ,/3,7v[w](m) = -||m||£-— ^lncosh \ (3 ^ g[u] |m M | p 1 sign(m M ) 

(6.2) 

where - + - = 1. 6 Moreover 
p i 

$ P ,f3,N[u](m) = (V><z,/?,ivM o VE M ) (m) (6.3) 
where ip q ,p,N [w] '■ K M — > ffi. is given by 

1 1 N 

iP q ,p, N [")(x) = 'Ml ~ ^^mcosh(/?(&HaO) (6.4) 

and V£ M : M M -> M M , by 

VE M {m) = (ViE M (mi), . . . , V ^E M {m^), . . . , V M E M (m M )) (6.5) 
where 

V M £ M (m M ) = sign(m M )|m M | p_1 (6.6) 

Since V ^Em is a continuous and strictly increasing function going to 
+oo, resp. — oo, as goes to +oo, resp. — oo, (and being zero at 

= 0) its inverse V^E^ 1 exists and has the same properties as 
V ^Em- It is thus enough, in order to study the structure of the minima 
of $ Pli 8,jv[w], to study that of V'g.frivM. 

Before stating our main theorem we need to make some comments 
on the generalized Curie- Weiss functions 

<f> q ,p(z) = -\z\ q - i lncosh(^) (6.7) 
Q P 

The standard Curie- Weiss case q = 2 is well documented (see e.g. 
[El]), but the general situation can be analyzed in the same way. In 



even in this more restricted setting, and since this requires only a little more work, 
we decided to present those results. 

6 Throughout this section, q will stand for the conjugate of p. 
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a quite general setting, this can be found in [EE]. A new feature for 
q < 2 is that now zero is always a local minimum and that there is a 
range of temperatures where three local minima exist while the absolute 
minimum is the one at zero. For sufficiently low temperatures, however, 
the two minima away from zero are always the lowest ones. The critical 
temperature (3 C is defined as the one where 4> q ^ takes the same value 
at all three local minima. Thus a particular feature for all q < 2 is 
that for (3 > (3 C , the position of the deepest minimum, x*((3), satisfies 
x*{(3) > x*(P c ) > 0. Of course x*(f3 c ) tends to as q tends to 2. For 
integer p > 3 we have thus the situation that x*(/3) = 0(1), and only 
in the case p = 2 do we have to take the possible smallness of x*{f3) 
near the critical point into account. 

Proposition 6.1. Assume that are i.i.d., symmetric bounded ran- 
dom variables with variance 1. Let either p = 2 or p > 3. Then for all 
(3 > (3 c (p) there exists a strictly positive constant C p ((3) and a subset 
OiCO with P[Oi] > 1 - 0(e- aN ) such that for all u> G fii the follow- 
ing holds for all x for which x^ = sign(m^)\m ll \ p ~ 1 with \\m\\2 < 2: 
There is 7 a > and a finite numerical constant c\ such that for all 
7 < 7a i/inf SjA1 \x - se M x*|| 2 > ic x x\, 

$ v w\u\{x)-\x*tf (6.8) 

where C 2 (/3) ~ (m*(/3)) 2 as J, 1, and C p (f3) >C p >0forp>3. The 
infima are over s G {— , 1, +1} and fj, = 1, . . . , M. 

Remark. Estimates on the various constants can be collected from 
the proofs. In case (i), C 2 (f3) goes like 10~ 5 , and 7 a ~ 10~ 8 and 
Ci ~ 10 -7 . These numbers are of course embarrassing. 

From Proposition 6.1 one can immediately deduce localization 
properties of the Gibbs measure with the help of the theorems in Sec- 
tion 5. In fact one obtains 

Theorem 6.2. Assume that £•* are i.i.d. Bernoulli random variables 
taking the values ±1 with equal probability. Let either p = 2 or p > 3. 
Then there exists a finite constant c p such that for all (3 > (3 c (p) there 
is subset Oi C O with P[Oi] > 1 — 0(e~ aN ) such that for all u G fl\ 
the following holds: 
(i) In the case p = 2, 

Qf),N,M{N)[w] (U v B C2Tm '(seV)) > 1 - exp {-KM{N)) (6.9) 
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(ii) In the case p > 3, 

Qp,N,M(N)[w] (u S)At {m E R M \x(m) E B Cpa \ lna \(se^x* q )}) 

> 1 — exp (-KM(N)) 

(6.10) 

Moreover, for h = ese M , and any e > 0, for p = 2 

Qp,N,M(N) M (^W (se^m*)) > 1 - exp (-K(e)M(iV)) (6.11) 
and for p > 3, 

Qp,N,M(N)l"} ({™ ^R M \x(m) E B c 

p a\ In a | (se^X 

»}) 

> 1 — exp (—K(e)M(N)) 
with K(e) > const.e > 0. 

Remark. Theorem 6.2 was first proven, for the case p = 2, with 
imprecise estimates on the radii of the balls in [BGP1,BGP3]. The 
correct asymptotic behaviour (up to constants) given here was proven 
first in [BG5]. A somewhat different proof was given recently in [T4], 
after being announced in [T3] (with additional restrictions on (3). The 
case p > 3 is new. It may be that the |lna| in the estimates there 
can be avoided. We leave it to the reader to deduce Theorem 6.2 from 
Proposition 6.1 and Theorems 5.1 and 5.3. In the case p > 3, Theorem 
3.6 and the remark following the proof of Theorem 5.1 should be kept 
in mind. 

Proof of Proposition 6.1. We follow our basic strategy to show first 
that the mean of ifjq^^N^} has the desired properties and to control 
the fluctuations via concentration estimates. We rewrite V'o^s.jvM^) 
as 

iP q ,p, N [u>] (x) = + E 1 1 1 (£i ,x) | * - 1 In cosh (/?(£! , x) ) 

+ l\\ x \\l-hs 1 \(^x)\i 

1 N 

+ {Elncosh(/3(6, x)) - lncosh(/3(6, x))} 

(6.13) 

We will study the first, and main, term in a moment. The middle term 
"happens" to be positive: 
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Lemma 6.3. Let {Xj,j = l,...,n} be i.i.d. random variables with 
KXi = 0, KXf = 1, and let x = (x±, . . . , x n ) be a vector in W 1 . Then, 
for 1< q < 2, 

n 

H«-E| J>,-X,f >0 (6.14) 
i=i 

Equality holds if all but one component of the Xj are zero. 

Proof. A straightforward application of the Holder inequality yields 

n n 

E| J>^f < (EI^X/)* = ||x||| (6.15) 

J=l 3=1 



Let us now consider the first term in (6.13). For q = 2 we have 
from [BG5] the following bound: Let 

lncosh(j3x*) 1 
W = ^)2 " 2 ( 6 - 16 ) 

Then for all (3 > 1 and for all z 

Moreover c{(3) tends to \ as f3 | oo, and behaves like ^(x*(/?)) 2 , as 
fill- 

Proposition 6.4. Assume thatt^ are i.i.d., symmetric andK(^) 2 = 1 
and < 1. Let either p = 2 or p > 3. TTien /or aZZ (3 > (3 C (of 
p) there exists a positive constant C q {(3) such that for all x such that 
x fi = sign(m^) \m fI \ p ~ 1 with \\m\\2 < 2, 

E 1 1 1 (Ci , a) | « - i In cosh(/3(6 , x ) ) J - I ( x * ) « + I In cosh ) 
> C„(/3) inf || x - se^H 

fJ,,S 

(6.18) 

where x* is the largest solution of the equation x q 1 = tanh/fe. in the 
case q = 2 C 2 ((3) = ^ ( ^0P - §) « MoMo(**) 2 /or /3 | 1. 
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Remark. Note that nothing depends on a in this proposition. The 
constants appearing here are quite poor, but the proof is fairly nice 
and universal. In a very recent paper [T4] has a similar result where 
the constant seems to be 1/256L, but so far we have not been able to 
figure out what his estimate for L would be. Anyway, there are other 
options if the proof below is not to your taste! 

Proof. It is not difficult to convince oneself of the fact that there exist 
positive constants C q (f3) such that for all Z = satisfying the 

assumption of the proposition 



-\Z\ q - - lncosh(/3Z) - -(x*) q + - lncosh(/3x*) > C q {(3) {\Z\ - x*) 2 

(6.19) 

For q = 2 this follows from Lemma (6.17). For q > 3, note first that 
the allowed \Z\ are bounded. Namely, 



M 
Ai=l 



< 



(6.20) 
using that | 

^|| oo — 1 <md the Holder inequality in the case p > 3. 
Moreover since by definition ±x* are the only points where the func- 
tion (p Qy p(z) takes its absolute minimum, and x* is uniformly bounded 
away from 0, it is clear that a lower bound of the form (6.19) can be 
constructed on the bounded interval [—2, 2]. 

We have to bound the expectation of the right hand side of (6.19). 

Lemma 6.5. Let Z = X + Y where X, Y are independent real valued 
random variables. Then for any e > 



E(\Z\ -x*f > ^ (VeZ 2 " -x*Y + ^e 2 ¥[\X\ > e] 
x min(P[y > e],P[Y < -e]) 



(6.21) 



Proof. First observe that, since E\Z\ < VEZ 2 , 

E(\Z\ -x*) 2 = (yEZ? -x*) +2x*E(^v / EZ2 - \Z\j 
> (VEZ2-x*y 



(6.22) 
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On the other hand, Tchebychev's inequality gives that for any positive 

E(|Z| -x*f > e 2 F[\\Z\ -x*\ > e] (6.23) 

Now it is clear that if \X\ > e, then | \X + Y \ — x* \ > e either if Y > e 
or if Y < — e (or in both cases). This gives the desired estimate. Thus 
(6.23) implies that 

E(\Z\ -x*f > e 2 ¥[\X\ > e]min(P[y > e],P[Y < -e]) (6.24) 
(6.22) and (6.24) together imply (6.21). 



In the case of symmetric random variables, the estimate simplifies 

to 

E(\Z\ - x*) 2 > - (VEZ 1 - x*Y + ^e 2 P[|X| > e]P[|y| > e] (6.25) 

which as we will see is more easy to apply in our situations. In partic- 
ular, we have the following estimates. 

Lemma 6.6. Assume that X = (x,£) where < 1, E£ M = and 
E(^) 2 = 1. Then for any 1 > g > 0, 

P[|^|>^||x|| 2 ]>^(l-^ 2 ) 2 (6.26) 



Proof. A trivial generalization of the Paley-Zygmund inequality [Tal] 
implies that for any 1 > g > 

P [\X\ 2 > g 2 E\X\ 2 } > (1 - /) 2 (E ^ 4 2)2 (6.27) 

On the other hand, the Marcinkiewicz-Zygmund inequality (see [CT], 
page 367) yields that 

E|(x,0| 4 <4E^4(n 2 j <4|M|4 (6.28) 

while EX 2 = \\x\\l. This gives (6.26). 

45 



Combining these two results we arrive at 

Lemma 6.7. Assume that Z = (x,£) with £ M as in Lemma 6.6 and 
symmetric. Let I C {1, . . . M) and set x M = x^, iffxEl,x^ = Oif 
[i I. Put x = m — x. Assume \\x\\2 > \\x\\2- Then 

E(\Z\-x*) 2 >l(\\xh-xn 2 + ^\\x\\ 2 2 (6.29) 



Proof. We put e = y||x||2 in (6.25) and set g 2 = |. Then Lemma 6.6 
gives the desired bound. 



Lemma 6.8. Let Z be as in Lemma 6.7. Then there is a finite positive 
constant c such that 

E(|Z| -x*) 2 > cinf ||x — se^x* ||1 (6.30) 

fJ,,S 

where c > 

Proof. We assume w.r.g. that x>|x2| > \x$\ > ... > \xm\ 
and distinguish three cases. Case 1: x\ > ^H^Hl- Here we set 
x = (0, x 2 , ■ ■ ■ , xm)- We have that 

II 1 * ||2 n - ||2 , / *\2 

\\x — e x \\ 2 = \\x\\ 2 + {xi — x ) 

< \\x\\\ + 2(xi - ||x|| 2 ) 2 + 2(||x|| 2 - x*) 2 (6.31) 

< 3\\x\\ 2 + 2(\\x\\ 2 - x*) 2 

Therefore (6.29) yields 

- (\\x\\ 2 ~ x*) 2 + —\\x\\ 2 > (3||x||! + 1500/2(||d| 2 - x*) 2 ) 

2 v " 11 1 500 11 112 ~ 3-500 V " 112 ' V " 11 ; ; 



\ 1 II 1 *\\2 

> \\x — e x 9 

- 3-500" 112 



(6.32) 
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which is the desired estimate in this case. 

Case 2: x\ < ^H^Hl, x\ > \\\x\\2- Here we may choose x = 
(0, x 2 , 0, . . . , 0). We set x = (0, 0, £3, ... , xm)- Then 

\\x - e 1 x*\\l < (xi - a;*) 2 + ||x||2 + ||£|| 2 (6.33) 

But < \\x\\ 2 - ^\\x\\l < 2\\x\\ 2 and 

(x! - x*) 2 < (§|M| 2 - x*) 2 < 2(||x|| 2 - x*) 2 + l\\ x \\l^-^x*\\x\\ 2 
<2(\\x\\ 2 -x*) 2 + 2\\x\\ 2 

(6.34) 

Thus \\x — e 1 x * HI < 4||x||| + 2(||x|| 2 -x* ) 2 , from which follows as above 
that 

- (\\x\\ 2 - x*f + —\\x\\ 2 2 > —^—\\x - eV||! (6.35) 
2 v " " ; 500" 112 ~ 4-500" 112 V ; 



Case 3: x\ < |||;r|| 2 , x 2 < ^ 1 1 ^ 1 1 2 - I n this case it is possible 
to find 1 < t < M such that x = (x±, x 2 , . . . , xt, 0, . . . , 0) and 
x = (0, . . . ,0,x f+ i, . . . ,xm) satisfy ||£|| 2 | < ilMli- In par- 

ticular, \\x\\l < f pill, and (x*) 2 < 2(\\x\\ 2 - x*) 2 + 2||x||§ < 2(||x|| 2 - 
x*) 2 + f \\x\\ 2 2 . Thus 

\\x - e 1 x*\\ 2 2 < (x*) 2 + \\x\\l + \\x\\l < 2(\\x\\ 2 - x*) 2 + 8\\x\\l (6.36) 
and thus 

- (\\x\\ 2 - x*) 2 + —\\x\\ 2 > — ^||;r-eV|| 2 (6.37) 

2 v " 11 ; 500" 112 ~ 8-500" 112 V ; 

Choosing the worst estimate for the constants of all three cases proves 
the lemma. Proposition 6.4 follows by putting al together. 



We thus want an estimate on the fluctuations of the last term in 
the r.h.s. of (6.13). We will do this uniformly inside balls Br(x) = 
{x' e R M I \\x - x'\\ 2 < -R} of radius R centered at the point x G R M . 

Proposition 6.9. Assume a < 1. Let {£f }i=i,...,jV;/i=i,...,M be i.i.d. 
random variables taking values in [—1, 1] and satisfying E£f = 0, 
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E(£f ) 2 = 1. -For any < oo and x G {sm*e M , s = ±1, u = 1, . . . , M} 
we have: 

i) For p = 2 and (3 < 11/10, there exist finite numerical constants C , 
K such that 7 



P 



sup 

xeB R (x ) 



1 N 

— {E In cosh(/3 (& ,x) ) - In cosh(/?(& , x) ) } 



i=i 



> CVaR(m* +R)+ Cam* + 4a 3 (m* + i?) 



<ln(^)e-^ + -^ 
i*brp > 3 and /5 > /3 C , and for p = 2 and /3 > 11/10, 



(6.38) 



P 



sup 

xEB R (x ) 



N 



— ^{Elncosh(/3(6,x))-lncosh(/3(6,x))} 



> C\/aR(R + \\xoh) + Ca + 4a/ 



<ln(^)e- aN + ^ N 



(6.39) 



Proof. We will treat the case (i) first, as it is the more difficult one. 
To prove Proposition 6.9 we will have to employ some quite heavy 
machinery, known as "chaining" in the probabilistic literature 8 (see 
[LT]; we follow closely the strategy outlined in Section 11.1 of that 
book) . Our problem is to estimate the probability of a supremum over 
an M-dimensional set, and the purpose of chaining is to reduce this 
to an estimate of suprema over countable (in fact finite) sets. Let 
us use in the following the abbreviations f(z) = f3~ l lncosh^z) and 
F{£, x)=± J2?=i /((&> x ))- We us denote by W M ,r the lattice in R M 
with spacing r/y/M. Then, for any x G IR M there exists a lattice point 
y G WM,r such that \\x — 2/ 1 1 2 < f. Moreover, the cardinality of the set 
of lattice points inside the ball Br(xq) is bounded by 9 



W M ,rf)B R (x ) 



< e 



aN[\n{R/r)+2] 



(6.40) 



The absurd number 11/10 is of course an arbitrary choice. It so happens that, 
numerically, m*(l.l)w0.5 which seemed like a good place to separate cases. 

Physicists would more likely call this "coarse graining" of even "renormalization" . 
9 For the (simple) proof see [BG5]. 
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We introduce a set of exponentially decreasing numbers r n = e~ n R 
(this choice is somewhat arbitrary and maybe not optimal) and set 
W(n) = WM,r„ H B rn _ 1 (0). The point is that if ro = R, any point 
x G Br(xq) can be subsequently approximated arbitrarily well by a 
sequence of points k n (x) with the property that 

kn(x) — k n -i(x) G W(n) (6.41) 

As a consequence, we may write, for any n* conveniently chosen, 

|F(£, x) - EF(£, x)| < |F(£, fc (*)) - EF(£, fc (*))l 

n 

+ J] |F(£, fc n (x)) - F(£, fc n _i(x)) - E(F(£, U*)) - F(£, 

n=l 

+ |F(£, x) - F(£, M*)) - E(F(£, x) - F(£, M*)))l 

(6.42) 

At this point it is useful to observe that the functions F(£, x) have some 
good regularity properties as functions of x. 



Lemma 6.10. For any x G 



anrf j/ G 



1 

< 



£ {hicosh(/3(&, x)) - lncosh(/?(&, y))} 



i=l 



|x-2/|| 2 max(||x||2, ||y||2)PH < 11/10 



(6.43) 



F - 2/ 2 



1/2 



> 11/10 



Proof of Lemma 6.10. Defining F as before, we use the mean value 
theorem to write that, for some < 9 < 1, 



1 N 

|F(£, x) - F(£, y) \ = - J> - y, &)/' ((6, * + % - *))) 



i=l 



< 



N 1 2 r n 

-£(x- y ,eo 2 ^ £(/'(&,* +*(*-*)))) 

i=l J L i=l 



(6.44) 



By the Schwarz inequality we have 

AT 

X 



^(x-y.eo^lk-ylllll^ll 



(6.45) 



i=l 
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To treat the last term in the r.h.s. of (6.44) we will distinguish the two 

cases (3 < jk and (3 > ^. 

1) If (3 < jk we use that \f'(x) \ = \ tanh(/fo)| < (3\x\ to write 



N „ iV 



^ £(/'(&> x + % - ^)))) 2 < (i) 2 ± £>, x + % - x)f 

i=i i=i 

i N 

^(S) 2 jvS ( ^' 6)2 + (1 -^ )(?/ '^ )2) 



,2 



<(^ 2 max(|N||,|H||)P|| 
which, together with (6.44) and (6.45), yields 



(6.46) 



\F(Z,x)-F(£,y)\ < \\x-y\\ 2 max(\\x\\ 2 ,\\y\\ 2 )\\A\\ (6-47) 
2) If A (3 > we use that < 1 to get 

\F(Z,x)-F(Z,y)\<\\x-y\\ 2 \\A\\ 1 / 2 (6.48) 
This concludes the proof of Lemma 6.10. 

■ 

Lemma 6.10 implies that the last term in (6.42) satisfies 
\F(Z,x)-F(Z,k n .(x))-E(F(S,x)-F(S,k n .(x)))\<const.r n . (6.49) 



which can be made irrelevantly small by choosing, e.g., r n * = a 3 . 
From this it follows that for any sequence of positive real numbers tk 
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such that J2^=o ^ n — we have the estimate 



P 



sup |F(£, x) - EF(£, x)| > * + £ + r n * ||x|| 2 (||A|| + E\\A\\) 

xeB R (x ) 



<F[\F(Z,x )-EF(Z,x )\>t\ 



+ P 



sup |F(£,fc (x))-F(£,x ) 

x€B R (x ) 

-E(FK,Mi))-F((.io))| >*o 



n=l 



sup 

i£Br(io) 

E(F(£, fe n (x)) — F(£, fe n _i(x)))| >t n 



<P[|F(e,x )-EF(e,x )| >*] 

+ e M[ln ^ + %[||F(£,*)-EF(^)|>t ] 



n=l 



,M[ln£+2] 



P 



|F(e,A;n(^))-F(e,A;n-i(^)) 



- e(f(c, fc„(x)) - fc„_i(x)))i > * T 



where we used that the cardinality of the set 



Cnrd\\F{Z,k n {x)) - F^kn-^x)) 



(6.50) 



- E(F(£, k n {x)) - F(£, fc„-i(x)))| ; x e S fl (x )| ( 6 - 51 ) 
< Cax^WM.r™-! n S fl (xo)} < exp (m[1u £ + 2]) 

We must now estimate the probabilities occurring in (6.50); the first 
one is simple and could be bounded by using Talagrand' s Theorem 
6.6 cited in Section 4. Unfortunately, for the other terms this does not 
seem possible since the functions involved there do not satisfy the hy- 
pothesis of convex level sets. We thus proceed by elementary methods, 
exploiting the particularly simple structure of the functions F as sums 
over independent terms. Thus we get from the exponential Tchebychev 
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inequality that 

P [F(£, x) - F(£, y) - E[F(£, x) - F(£, y)] > 5] 

JV 

< inf e" 5s TTEe + ^ (/(( ^' x)) " /(( ^' y)) " E[/((?I ' x)) " /((?I ' y))]) 

s>0 J-l 



< inf e~ Ss TT 

s>0 1 J - 



i=l 



l+2^E(/((e 4 ,x))-/((e 4 ,y)) 



- E[/((&, x)) - /((&, y))]) 2 e *l/(«- !B ))-/(«*.w))- E [/(«'. a '))-/(«'.f))]l 

(6.52) 

We now use that both |tanh(/fa)| < 1 and |tanh(/?x)| < (3\x\ to get 
that 

\Mu x)) - /((&, y))\ < (x - y))| max|/'((6, z))| < |(fc, (x - y))\ 

(6.53) 



and 



\Mi,x))-f{{ti,v))<\(£i,(x-v))\ 



<P\(^(x-y))\ma^(\(^x)l\(^y)\) 



(6.54) 



The second inequality will only be used in the case p > 3 and if f3 < 1.1 
Using the Schwarz inequality together with (6.53) we get 



E (/((&, *)) - /((&, y)) - E[/((fc, x)) - /((C*, y))])' 

,*))- 

1/2 



x e ^l/((€i,*))-/(«i,»))-E[/(«i,x))-/((€i,y))]| 



< 



8E(/((6,x))-/((6,2/))r 



x 



']EeTrl/((€i.*))-/(«i.tf))-E(/((€i,*))-/(«i,tf)))l 



1/2 



(6.55) 



< [E(&, x - y) 4 ] 1/2 Ee^l^'^-^l 



1/2 



,-kE\(ii,{x-y))\ 



Using (6.54) and once more the Schwarz inequality we get an alternative 
bound for this quantity by 



V8(3 2 [E(6,x-y) 8 ] 1/4 max([E(6,x) 8 ] 1/4 [E&, y)*} 1/4 ) 



x 



Eeft I (&,(*-»))! 1/2 e £ E K£*>(*-i/))l 



(6.56) 
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The last line is easily bounded using essentially Khintchine resp. 
Marcinkiewicz-Zygmund inequalities (see [CT], pp. 366 ff.), in par- 
ticular 



cc || 2 reps. 2/ v 2 if £f are Bernoulli 

2 

Ee*!^*'*)! < 2e^ c with c = 1 if £f are Bernoulli 
(x - y)f k < 2 2k k k \\{x - y)\\f no 2 2k if £f are Bernoulli 



(6.57) 



Thus 



E (/((&, *)) - /((&, y)) - E[/((fc, *)) - /((&, y))]f 

x e ^l/((Ci,*))-/((€i,I/))-E[/(« i ,a ! ))-/(« i ,y))]| 

< >/8>/32||(a: - y )g e ^ x - yh+c ^ x - y ^ (6.58) 
respectively 

< P 2 V82H 2 \\X -y\\ 2 (\\ X \\ 2 + \\y\\ 2 ) 2 e wV2\\ X -yh+c^\\ x -y\\l 



In the Bernoulli case the constants can be improved to 2^/8 and V84 2 , 
resp., and c = 1. 

Inserting (6.58) into (6.52), using that 1 + x < e x and choosing s gives 
the desired bound on the probabilities. The trick here is not to be 
tempted to choose s depending on 5. Rather, depending on which 
bound we use, we choose s = or s = -n M ,m/^ . M M ; . This 

' IF-J/Ih lF-J/||2||(||a;||2 + ||j/||2) 

gives 



P x) - F(£, y) - E[F(£, x) - y)] > d] 

< exp (_ N y^L + 8aNe v^+2cA 



(6.59) 



respectively 



P [F(£, x) - F(£, ?/) - E[F(£, x) - F(£, y)] > 5] 

(, — I ca. \ 
-N Tl n ,,, I, x + aN(3 2 V82 4 4 2 e M ^ +M ^ ^ + M 2 y' 
\\x-vh(\\xh + \\vh) y 



(6.60) 
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In particular 



P 



< 



\F(^k n (x))-F(^k n ^(x)) 
-E[|F(e,fe„(x))-F(e,fe n _i(x))]| >t r 
2 eX p [-N 1 ^ + NSaNe^ 2 ^ 



(6.61) 



and 



P 



|F(£,/c n (aO)-F(£,fcn-iOr)) 



E[\F(£,k n (x)) - F^kn-tix))}] >t n 

— — +aiV/3 2 \/82 4 4 2 e« +l|a! o || 2 + (ii-oii2+«)^ 

r n _i(i2+||x || 2 ) / 

(6.62) 

We also have 



< 



P [|F(£, feo(x)) - F(£, x ) - E [F(£, feo(x)) - F(C, x )]| > to] 



< 



2exp ("^^ + 8«iVe^+ 2c ^ 



(6.63) 



and 



P [|F(£, feo(x)) - F(£, x ) - E [F(£, fco(x)) - F(£, x )]| > to] 

< 2 exp \-N—J^- + aN p 2 V82 4 4 2 e R +^>\\2 + urfW ) 

~ P \ R(\\x h + R) ^ J 



Since ||xo||2 + R > s ° that r+\\^ \\ 2 + 



(6.64) 

C7 2 (x*)2 < c'7- Thus in the case /5 < 1.1 we may choose to and t n 
as 



to = V^R{\\xoh + R)\l + 2 + /3 2 2 4 4 2 v / 8e c '^ + 1 



(6.65) 



and 



t n = V^e-^-^R(\\xoh + R) \n + 2 + /3 2 2 4 4 2 e c ' 7 + ll (6.66) 
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Finally a simple estimate gives that (for xq = ±m*e M ) 



P [\F(£, x ) - EF(f , x )| > t] < 2e 2 °(™*) ; 



-N 



(6.67) 



Choosing t = m*a, setting n* = ln(^j, and putting all this into 
(6.50) we get that 



P 



sup \F(Z,x)-EF(Z,x)\ > 
xeB R (x ) 



sfaR{\\x \\ 2 + R ) I 4 + P 2 2 4 4 2 V8e c '~< 

+ 3 + ^ + /3 2 2 4 4V'^ + m*a + a 3 (\\x \\ 2 + R) 



(6.68) 



e + 2e" a 27V / 20 



This proves part (i) of Proposition 6.9 and allows us to estimate the 
constant C in (6.38). In the same way, but using (6.61) and (6.63), we 
get the analogous bound in case (ii), namely 



P 



sup \F(Z,x)-EF(£,x)\> 

xeB R {x ) 



^R{\\x Q \\ 2 + R){4 + 8e c ' 7 + 3 + ^ + 8e c ' 7 ) + 4a 3 



(6.69) 



< In [ % ] e 



-aN 



This concludes the proof of Proposition 6.9. 



Remark. The reader might wonder whether this heavy looking chain- 
ing machinery used in the proof of Proposition 6.9 is really necessary. 
Alternatively, one might use just a single lattice approximation and 
use Lemma 6.10 to estimate how far the function can be from the lat- 
tice values. But for this we need at least a lattice with r = y/a, and 
this would force us to replace the \/a terms in (6.38) and (6.39) by 
\J ot \ lna|. While this may not look too serious, it would certainly spoil 
the correct scaling between the critical a and j3 — 1 in the case p = 2. 
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We are now ready to conclude the proof of Proposition 6.1. To 
do this, we consider the 2M sectors in which sx*e M is the closest of 
all the points sx*e v and use Proposition 6.9 with xq = se^x* and R 
the distance from that point. One sees easily that if that distance 
is sufficiently large (as stated in the theorem), then with probability 
exponentially close to one, the modulus of the last term in (6.13) is 
bounded by one half of the lower bound on the first term given by 
Proposition 6.4. Since it is certainly enough to consider a discrete set 
of radii (e.g. take R G Z/iV), and the individual estimates fail only 
with a probability of order exp(— aN), it is clear that the estimates on 
ip hold indeed uniformly in x with probability exponentially close to 
one. This concludes the proof of Proposition 6.1. 
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7. Local analysis of $ 



To obtain more detailed information on the Gibbs measures re- 
quires to look more precisely at the behaviour of the functions 
*&p,P,n(™>) in the vicinities of points ±m*(/3)e /i . Such an analysis has 
first been performed in the case of the standard Hopfield model in 
[BG5]. The basic idea was simply to use second order Taylor expan- 
sions combined with careful probabilistic error estimates. One can cer- 
tainly do the same in the general case with sufficiently smooth energy 
function EM(m), but since results (and to some extent techniques) de- 
pend on specific properties of these functions, we restrict our attention 
again to the cases where £^m(w) = with p > 2 integer, as in 

the previous section. For reasons that will become clear in a moment, 
the (most interesting) case p = 2 is special, and we consider first the 
case p > 3. Also throughout this section, the £f take the values ±1. 

7.1. The case p > 3. 

As a matter of fact, this case is "misleadingly simple" 7 . Recall 
that we deal with the function $ Pij g 5 jv("7,) given by (6.2). Let us 
consider without restriction of generality the vicinity of m*e 1 . Write 
to = m*e 1 +v where v is assumed "small", e.g. \\v\\2 < e < to*. We 
have to consider mainly the regions over which Proposition 6.1 does not 
give control, i.e. where || sign (to) |m| p_1 — e 1 (m*) p ~ 1 1| 2 < ciy/a (recall 
(6.6)). In terms of the variable v this condition implies that both 
l^i 1 2 < C-J~a and IHljjpI 2 . — C^/a for some constant C (depending on 
p), where we have set v = (0, V2, i>3, . . . , vm)- Under these conditions 
we want to study 

^jv^e 1 +v)- Qp^Nim'e 1 ) = ± ((m* + v 1 y - (m*) p + \\v\\ p p ) 



- — E 



In cosh j (3{{m* +v 1 ) p ~ 1 + J^lf^" 1 ) 

M>2 



lncosh(/3(m*) p - 1 ) 

(7.1) 

where we have set ^ = ^Cf • The crucial point is now that we can 
expand each of the terms in the sum over % without any difficulty: for 

| (to* +v 1 )p~ 1 -{m*y- l \ < |t;i|(p-l)(m* + |?;i|) p - 2 < C\ Vl \, and, more 



7 But note that we consider only the case M^ctN rather than M~ctN p 1 
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importantly, the Holder inequality gives 



\ P -i 



At>2 



< 



(7.2) 



As explained earlier, we need to consider only v for which ||t>||2 < 2, 

and \\v\\oo < || 230—2 < (C^/a) 1 2p ^ is small on the set we consider. 
Such a result does not hold if p = 2, and this makes the whole analysis 
much more cumbersome in that case — as we shall see. 

What we can already read off from (7.1) otherwise is that v± and 
v enter in a rather asymmetric way. We are thus well-advised to treat 
|i>i| and 1 1 "0 1 1 2 as independent small parameters. Expanding, and using 
that m* = tanh(/3(m*) p_1 ) gives therefore 



$ P ,/?,iv(mV +v)- $ p ,p, N (m*e L ) 
= vf-^{m*y-* [I - (3(1 - (m*) 2 )(m*r 2 (p - 1)] 

+ -\H P P - f (1 " (rn*) 2 ) (s&(v)\v\*-\ ^ signer 1 



1 N 

i=l 



m* + ^(l-(m*) 2 )(m*) p ~V 



+ R(v) 



where 



(7.3) 



w „)l< w .l£zP<P-2)(p-a) 



6 



(m* + M) p ~ 3 



+ 



2 9/4 



1 * 

\ Vl \ 3 (p - l) 3 (m* + |e|) 3 ^- 2 ) + - sign (-0) j -0 1 ^~ 1 ) | 



x 



2/3 2 tanh/3 ((m* + jvij 



+ v 



N 

i=l 
ISo" 3 ) 



cosh 2 /? ((m* - I^iDp- 1 - ||v||llH|oo 3 ) 



(7.4) 

where the last factor is easily seen to be bounded uniformly by some 
constant, provided |i>i| and \\v\\2 are small compared to m*((3). Recall 
that the latter is, for (3 > (3 C , bounded away from zero if p > 3. (Note 
that we have used that for positive a and 6, (a + b) s < 2 9 / 4 (a 3 + b 3 )). 
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Note further that 



sign (v)\v\p-\^ sign (^r 1 ) < ||A(iV)|| J>J> 



(7.5) 



< \\A(N)\\\\v\\mr- 2 



\p\\ " MOO 



and 



AT AT 

^ sign^l^r 1 )! 3 < ^ sign^l^r 1 )! 



i=l 



<ll^(AOIIIHI^IIr 5 ll£ 



IP-3 



(7.6) 



so that in fact 



$ P ,/3,jv(mV + v) - ^^(mV) 

= vl^(m*) p - 2 [l - 0(1 - (m*) 2 )(m*) p - 2 (p ~ 1)] + -\H P P 



2 

A" 



q 



signer 1 ) 

i=l 



p. 



m* + -(1 - {m*) 2 ){m*) p - 2 v 1 
2 



+ i?(f) 

(7.7) 

where \R(v)\ < c ({v^ 3 + |H|£|H|^ 2 ) . 

These bounds give control over the local minima near the Mattis 
states. In fact, we can compute easily the first corrections to their 
precise (random) positions. The approximate equations for them have 
the form 



vi = ci(/?)-=(z, sign (t)) |-0| p x ) 
v J\ 



1 



(7.J 



z^m* + c 2 (/5)fi), for /i 7^ 1 



where z M = ^ £f and ci(/3), C2(/3) are constants that can be read 
off (7.7). These equations are readily solved and give 



vi 



N 



N 



z u (m + 



ClC 2 



N 



X 



(7.9) 
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where X is the solution of the equation 



X = 



jV(p-l)/2 H~NP 



z\\l I m* + C ^=X 



(7.10) 



Note that for N large, 



M (l-(-l)f)2P/ 2 r(i±£) 



(7.11) 



jV(p-i)/2 2v^F 
Moreover, an estimate of Newman ([N], Proposition 3.2) shows that 

— 1 < 2e - c ^ Ml/(P ~ 1) 



P 



|z||£-E||z||£| > 7 M^ 



(7.12) 



for some function Cp('j) > for 7 > 0. This implies in particular that 
is sharply concentrated around the value (which tends to zero 
rapidly for our choices of M). Thus under our assumptions on M , the 
location of the minimum in the limit as N tends to infinity is v\ = 

and = J ^Zfj,, and at this point $ P)i a i Ar(m*e 1 + v) — $ Pi/ 3 ) Ar(m*e 1 ) = 

O (M/Np/ 2 ). 

On the other hand, for \\v\\ p > 2yt~a(m* + c), 

$p,l3,N( m * el + V )~ *p,l3,N( m * el ) ^ + C 3ll^llp > ( 7 - 13 ) 

which completes the problem of localizing the minima of $ in the case 
p > 3. Note the very asymmetric shape of the function in their vicinity. 

7.2 The case p = 2. 

The case of the standard Hopfield model turns out to be the more 
difficult, but also the most interesting one. The major source of this 
is the fact that an inequality like (7.2) does not hold here. Indeed, it 

is easy to see that there exist v such that X^lf^ = >/M||£||2- The 
idea, however, is that this requires that v be adapted to the particular 
£i, and that it will be impossible, typically, to find a v such that for 

many indices, 



would be much bigger than ||t> H2 and to take 
advantage of that fact. The corresponding analysis has been carried out 
in [BG5] and we will not repeat all the intermediate technical steps here. 
We will however present the main arguments in a streamlined form. 
The key idea is to perform a Taylor expansion like in the previous case 
only for those indices % for which v) is small, and to use a uniform 
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bound for the others. The upper and lower bounds must be treated 
slightly differently, so let us look first at the lower bound. 
The uniform bound we have here at our disposal is that 



1 (m*) 2 1 x 2 
- — In cosh 3x > -lncosh^m* (7.14) 

(3 2/3 2 v ' 



Using this we get, for suitably chosen parameter r > 0, by a simple 
computation that for some < 6 < 1, 



®2,p,N(m*e 1 +v)- ^aKtoV) 

i i i N * N 

>h\v\\l- Ml - K) 2 )^ v) 2 - ^ *) 



2' v v ' 'N^^ 1 ' N 

i=l i—1 



1 1 N 

- -(1 - 0(1 - (m*) 2 ))- £ %^)I>™*}(M 2 



i=i 



The first two lines are the main second order contributions. The third 
line is the standard third order remainder, but improved by the char- 
acteristic function that forces (£i,v) to be small. The last line is the 
price we have to pay for that, and we will have to show that with large 
probability this is also very small. This is the main "difficulty" ; for the 
third order remainder one may use simply that 



1 1 „x|3 9/ ,2 tanh/j(m* 



- /3(m*(l-r)) 
^ ^7 X>' + T)(m*) 2 ^ cosh- 2 (1(m*(l - r)) 



For r somewhat small, say r < 0.1, it is not difficult to see that 

§1 

3 



— cosh 2 P(m*(l — r)) is bounded uniformly in (3 by a constant of 
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order 1. Thus we can for our purposes use 



1 N 



\ U M3 ofl 2 tanh/3(m* + 0(&,u)) 

{m,v)\<T m *}\{.£,i,v)\ i(3 — -2 



N 



(7.17) 



<r(l + r)(m*) 2 ^X>^) 



which produces just a small perturbation of the quadratic term. Setting 



1 * 



(7.18) 



we summarize our finding so far as 

Lemma 7.1. T/iere ezzste r c > (« 0.1,) swc/i £/ia£ /or a/Z /?, for 

t < t c , 



&2,l3,N( m * e + V )~ $ 2,/3,7v(TOV) 



l-(/3(l-(m*) 2 )+r(l + r)(m*) 2 )^ 



l(l-/?(l-( m *)2))X Tm ,( V ) 



v - 



m 
~N 



N 



i=l 



(7.19) 



Before turning to the study of X a (v), we derive corresponding lower 
bounds. For this we need a complement to (7.14). Using the Taylor 
formula with second order remainder we have that for some x 



— In cosh Bx < 

13 2 



(m*) 2 1 



x 



— In cosh dm* 

8 2 



+ {X f ] [1-/3(1- tanh 2 8{x))] 

(m*) 2 1, , n „ x 2 (x - m*) 2 
<^--lncosh/?m*- y + ^-A- 



(7.20) 



By a similar computation as before this gives 

Lemma 7.2. There exists r c > (as 0.1) such that for all f3, for 
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T < T c , 



* 1^ 



$2,/3,7v( TO * e + v) - $2,/3,N(m*e 



- \ 1 ' 



l-(/3(l-(m*y)-r(l + T)(m*) 2 ) 



N 



* N i 
^T,(ii^) + ^(l-(m^))X Tm .(v) 
i=i 



(7.21) 



To make use of these bounds, we need to have uniform control over 
the X a (v). In [BG5] we have proven for this the following 

Proposition 7.3. Define 



(l-3y^)2 a 2 



T(a,a/p)= 2\/2v / 2e (W^ *p* + a(\ lna| + 2) + ay/l + r(a) 



+ 2a 2 (l + r(a)) + \a ( 2e ^ +2v/3a(|lna| +2) 



(7.22) 



Then 



P 



sup X a (v) > p 2 T(a,a/p) < e~ aN + P[|| A - 1|| > r(a)] (7.23) 



We see that T(a, a, p) is small if a is small and p 2 is small compared 
to a which for us is fine: we need the proposition with a = rm* and 
with p < 7m* ci, where 7 is our small parameter. The proof of this 
proposition can be found in [BG5]. It is quite technical and uses a 
chaining procedure quite similar to the one used in Section 6 in the 
proof of Proposition 6.9. Since we have not found a way to simplify or 
improve it, we will not reproduce it here. Although in [BG5] only the 
Bernoulli case was considered, but the extension to centered bounded 
poses no particular problems and can be left to the reader; of course 
constants will change, in particular if the variables are asymmetric. 

The expression for T(a, a, p) looks quite awful. However, for a 
small (which is all we care for here) , it is in fact bounded by 



T(a,a/p) < C 



e -(i-2v^) +a (| lna | + 2 ) 



(7.24) 
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with C ~ 25. We should now choose r in an optimal way. It is easy to 
see that in (7.19), for p < crym* , this leads to r ~ 7a/| hi7|, uniformly 
in (3 > 1. This uses that the coefficient of X rm »(w) is proportional 
to (m*) 2 . Unfortunately, that is not the case in the upper bound of 
Lemma 7.2, so that it turns out that while this estimate is fine for (3 
away from 1 (e.g. (3 > 1.1, which means m* > 0.5), for (3 near one 
we have been too careless! This is only just: replacing (3(1 — tanh 2 f3x) 
by zero and hoping to get away with it was overly optimistic. This is, 
however, easily remedied by dealing more carefully with that term. We 
will not give the (again somewhat tedious) details here; they can be 
found in [BG5]. We just quote from [BG5] (Theorem 4.9) 

Lemma 7.4. Assume that (3 < 1.1. Then there exists r c > (xs 0.1) 
such that for r < t c , 



$2,p,N(m*e L +v)- <$>2,p,N(m*e L ) 



l-(/3(l-(m*) 2 )-T(l + T)(m*) 2 )^ 



* N 



i=l 



+ T^™*) IMI2 I 7 + 240e 



(7.25) 



For the range of v we are interested in, all these bounds combine 



to 



Theorem 7.5. For all (3 > 1 and for all \\v\\2 < crym* , there exists a 
finite numerical constant < C < 00 such that 



®2,(3,N(m*e L +v)- $2,/3,7v(m*e 1 ) 



1 * N 

i[l-/3(l-(m*) 2 )] |MH-^£&*) 



with probability greater that 1 — e 



i=i 

-aN 



<7v1^^) 2 |l^ll2 
(7.26) 



As an immediate consequence of this bound we can localize the 
position of the minima of $ near m*e M rather precisely. 



Corollary 7.6. Let v* denote the position of the lowest minimum of 
the function $2,/3,Af(w*e 1 + v) in the ball \\v\\2 < 07m*. Define the 
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vector z^ v > £ M M with components 



(7.27) 



There exists a finite constant C such that 



m 



1-/3(1 - (m*) 2 )' 

same probability 
greater than 1 — e _4M//5 , 



z (1) || 2 (m*) 



- C7v ^(l-/3(l-(mW 

(7.28) 

with the same probability as in Theorem 7.5. Moreover, with probability 



\z {1) h < 2 v^ 



(7.29) 



so that in fact 



1-/3(1- (m*) 2 )' 



< CWl ln 7l TO * 



(7.30) 



Proof. (7.28) is straightforward from Theorem 7.5. The bound 
on H^ 1 -*!^ was given in [BG5], Lemma 4.11 and follows from quite 
straightforward exponential estimates. 



Remark. We will see in the next section that for (3 not too large 
(depending on a), there is actually a unique minimum for ||w|| 2 < 07m*. 
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8. Convexity, the replica symmetric solution, convergence 

In this final section we restrict our attention to the standard Hop- 
field model. Most of the results presented here were inspired by a recent 
paper of Talagrand [T4] . 

In the last section we have seen that the function $ is locally 
bounded from above and below by quadratic functions. A natural ques- 
tion is to ask whether this function may even be locally covex. The 
following theorem (first proven in [BG5]) shows that this is true under 
some further restrictions on the range of the parameters. 



Theorem 8.1. Assume that 1 < (3 < oo. If the parameters ot,(3,p are 
such that for e > 0, 



inf (p(l - tanh 2 (/3m*(l - r)))(l + 3>/a) 

+ 2/3tanh 2 (/?m*(l -r))r(a,rm7p)) < 1 - e 



(8.1) 



Then with probability one for all but a finite number of indices N, 
$ 'TV \p[u](m* e 1 + v) is a twice differ entiable and strictly convex function 
of v on the set {v : \\v\\2 < p}, and X m i n (V 2 $Ar )( a[o;](m*e 1 + v)) > e 
on this set. 



Remark. The theorem should of course be used for p = cym*. One 
checks easily that with such p, the conditions mean: (i) For (3 close to 
1: 7 small and, (ii) For (3 large: a < c/3 _1 . 

Remark. In deviation from our general policy not to speak about the 
high-temperature regime, we note that it is of course trivial to show 
that Xmin (V 2 $Ar 5/ 3[a;](m)) > e for all m if (3 < ^^=p ■ Therefore 
all the results below can be easily extended into that part of the high- 
temperature regime. Note that this does not cover all of the high 
temperature phase, which starts already at (3~ x = 1 + y/a. 



Proof. The differentiability for fixed iV is no problem. The non-trivial 
assertion of the theorem is the local convexity. Since lncosh^a:) = 
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(3 (l — tanh 2 (/?a;)) we get 



1 N 

V 2 d>(mV + v) = 1 - - f$(m*& + (&, v))&i 



i=l 



= 1 " Tr E + 4 E tanh 2 (/?(m*^ + v))) 



i=i 



> 1 - P^f + |: &I { |(^)|<™*} tanh 2 (/?m*(l - r)) 



= 1-/3 [l-tanh 2 (/3m*(l-r))] 
- /3tanh 2 (/3m*(l - r))— |>™*} 



(8.2) 



Thus 



A min (V^K^ + s)) > l-/3[l-tanh 2 (/3m*(l-r))] ||A(iV)|| 



/3tanh 2 (/3m*(l - r)) 



AT 



at E I {I(^^)I>™*}^ T ^ 



i=l 



(8.3) 

What we need to do is to estimate the norm of the last term in (8.3). 
Now, 



sup 

veB p 



N 



i=i 



N 



sup sup ^^^I { | (6) „)| >rm , } (6,w) ; 

vSBp iw:||iw|| 2 =p i=1 



(8.4) 



AT 



< 4r SUp SUp ^ "^l I (€* ,«) | >-rrw* } (^i » 1 



veB p weB p i=l 



To deal with this last expression, notice that 

= ^{\(£,i,v)\>Tm*}(t,i, w) 2 (B-{\(ti,w)\<\(ti,v)\} + ^muw^m^}) 

- 1 {|(^,«)l>rm*}(6,^) 2 + ^-{\(U,w)\>rm*}(Ci,w) 2 

(8.5) 
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Thus 



N 



^X^K^)!^™*}^'™) 2 = x rm*{v)+ X rm *(w) (8.6) 
i=l 

and so we are reduced to estimating the same quantities as in Section 
7. Thus using Proposition 7.3 and the estimate (4.12) with e = y/a., 
we obtain therefore that with probability greater than 1 — e - const - aN 
for all v with norm less than p, 

A min (V^mV + v)) > 1 - /? [1 - tanh 2 (/?m*(l - r))] (1 + 3^) 

- 2/3tanh 2 (/3m*(l - r))T{a,rm* / p) 

(8-7) 

Optimizing over r gives the claim of the theorem. 



Remark. Note that the estimates derived from (8.7) become quite 
bad if (3 is large. Thus local convexity appears to break down for 
some critical (3 conv (a) that tends to infinity, as a | 0. In the heuristic 
picture [AGS] such a critical line appears as the boundary of the region 
where the so-called replica symmetry is supposed to hold. It is very 
instructive to read what Amit et al. write on replica symmetry breaking 
in the retrieval phases: "....the very occurrence of RSB 8 implies that the 
energy landscape of the basin of each of the retrieval phases has features 
that are similar to the SG 9 phase. In particular, each of the retrieval 
phases represents many degenerate retrieval states. All of them have 
the same macroscopic overlap m, but they differ in the location of the 
errors. These states are organized in an ultrametric structure" ([AGS], 
page 59) . Translated to our language, this means that replica symmetry 
breaking is seen as a failure of local convexity and the appearance of 
many local minima. On this basis we conjectured in [BG5] that replica 
symmetry is closely related to the local convexity of the free energy 
functional 10 

= replica symmetry breaking 

9 • , 

= spin glass 

10 We should note, however, that our condition for local convexity (roughly 
p~ 1 >a) does not have the same behaviour as is found for the stability of the replica 
symmetric solution in [AGS] (/3 _1 >exp(— l/2a)). It is rather clear that our condi- 
tion for convexity cannot be substantially improved. On the other hand, Talagrand 
has informed us that his method of deriving the replica symmetric solution which 
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We can now make these observations more precise. While we 
have so far avoided this, now is the time to make use of the Hubbard- 
Stratonovich transformation [HS] for the case of quadratic Em- That 
is, we consider the new measures Q/3,n,m = Qp n m defined in (5.14). 
They have the remarkable property that they are absolutely continuous 
w.r.t. Lebesgue measure with density 

1 exp(-pN<f>p, N , M (z)) (8.8) 



J(3,N,M 



(do the computation or look it up in [BGP1]). Moreover, in many 
computations it can conveniently replace the original measure Q. In 
particular, the following identity holds for all t G M. M . 



dQ/3,NMm)e^ m) = e— / dQf3, N , M (z)e {t ' z) (8-9) 



Since for t with bounded norm the first factor tends to one rapidly, this 
shows that the exponential moments of Q and Q are asymptotically 
equal. We will henceforth assume that we are in a range of (3 and a 
such that the union of the balls B p ^(sm*e^) has essentially full mass 
under Q. 

To study one of the balls, we define for simplicity the conditional 
measures 

Q ( In,m (0 = Qp,n,m (• \z G B p{e) (m*e 1 )) (8.10) 

with p(e) such that Theorem 8.1 holds. (Alternatively we could consider 
tilted measures with h proportional to e 1 and arbitrarily small). For 
notational convenience we will introduce the abbreviation Eg for the 

expectation w.r.t. the measure Q^'^m- 

Now intuitively one would think that since Qp^M nas a density 
of the form e~ NV(yZ ^ with a convex V with strictly positive second 
derivative, this measure should have similar properties as for quadratic 
V. It turns out that this is to some extent true. For instance, we have: 

Theorem 8.2. Under the hypothesis of Theorem 8.1, and with the 
same probability as in the conclusion of that theorem, for any t G M M 



does not require convexity, can be extended to work under essentially the conditions 
of [AGS]. 
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with ||t||2 < C < 00, 

e (t,E & z) _ ( e -M) < E . e {t,z) < e (t,E 6Z ) e P||^/eiV + ( e -M ) ( g_ n) 

In particular, the marginal distributions of Q converge to Dirac distri- 
butions concentrated on the corresponding projections o/KqZ. 

Proof. The main tool in proving this Theorem are the so-called 
Bras camp- Lieb inequalities 8 '[BL]. We paraphrase them as follows. 

Lemma 8.3. [Brascamp-LiebfBLjjLet V : R M — > R be non-negative 
and strictly convex with A m i n (V 2 F) > e. Denote by Ey expectation 
with respect to the probability measure 

-NV(x) rM 

(8.12) 



J e -NV(x) d M x 

Let f : M. M — > M. be any continuously differentiate function. Then 

E v (f-E v f) 2 <^E v (\\Vf\\ 2 2 ) (8.13) 



We see that we are essentially in a situation where we can apply 
Lemma 8.3. The only difference is that our measures are supported only 
on a subset of 1R M . This is however no problem: we may either continue 
the function <3>(m) as a strictly convex function to all R M and study the 
corresponding measures noting that all reasonable expectations differ 
only by exponentially small terms, or one may run through the proof 
of Lemma 8.3 to see that the boundary terms we introduce only lead to 
exponentially small error terms in (8.13). We will disregard this issue 
in order not to complicate things unnecessarily. To see how Lemma 8.3 
works, we deduce the following 

Corollary 8.4. Let Ey be as in Lemma 8.3. Then 
(i) E v ||x-E v x||2 < ^ 
(11) E v \\x-E v x\\\ <4j^ 

(Hi) For any function f such that Vt(x) = V(x) —tf(x)/N fort G [0,1] 
is still strictly convex and A m j n (V 2 Vt) > e' > 0, then 

0<lnE v e f -E v f <-±- sup E V JV/||! (8.14) 

2e yv te[o,i] 



We thank Dima Ioffe for having brought these to our attention 
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In particular, 




M 



Proof, (i) Choose f(x) = in (8.13). Insert and sum. (ii) Choose 
f(x) = x 2 and use (i). (iii) Note that 



where by assumption V s (x) has the same properties as V itself. Thus 
using (8.13) gives (8.15) (iv) and (v) follow with the corresponding 
choices for / easily. 



Theorem 8.2 is thus an immediate consequence of (iv). 



We now come to the main result of this section. We will show 
that Theorem 8.1 in fact implies that the replica symmetric solution of 
[AGS] is correct in the range of parameters where Theorem 8.1 holds. 
Such a result was recently proven by Talagrand [T4] , but we shall see 
that using Theorem 8.1 and the Brascamp-Lieb inequalities, we can 
give a greatly simplified proof. 

Theorem 8.5. Assume that the parameters (3, a are such that the 
conditions both of Theorem 6.2 and of Theorem 8.1 are satisfied, with 
e > and p > cym* , where c is such that the mass of the complement 
of the set U Si(1 B C7m «(sm*e' 1 ) is negligible. Then, the replica symmetric 
solution of [AGS] holds in the sense that, asymptotically, as N j oo, 
Eg^i, and E||Egi||| (recall that z = (0,^2, . . .) converge almost surely 
to the positive solution fi and r of the system of equations 




21 



(8.15) 



o 




(8.16) 




(8.17) 
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(note that q is an auxiliary variable that could be eliminated). 

Remark. As far as Theorem 8.5 is considered as a result on conditional 
measures only, it is possible to extend its validity beyond the regime of 
Theorem 6.2. In that case, what is needed is only Theorem 8.1 and the 
control of the location of the local minima given by Theorem 7.5. One 
may also, in this spirit, consider the extension of this result to other 
local minima (corresponding to the so-called "mixed patterns"), which 
would, of course, require to prove the analogues of Theorem 7.5, 8.1 
in this case, as well as carrying out the stability analysis of a certain 
dynamical system (see below) . We do not doubt that this can be done. 
Remark. We will not enter into the discussion on how these equations 
were originally derived with the help of the replica trick. This is well 
explained in [AGS] . In [T4] it is also shown how one can derive on this 
basis the formula for the free energy as a function of /2, r, and q that is 
given in [AGS] and for which the above equations are the saddle point 
equations. We will not repeat these arguments here. 
Remark. In [PST] it was shown that the replica symmetric solution 
holds if the so-called Edwards-Anderson parameter, ^2 i [^/3,N,M(o'i)] 2 
is self-averaging. Some of the basic ideas in that paper are used both 
in Talagrand's and in our proof below. In fact we follow the strategy 
of [PST] more closely than Talagrand, and we will see that this leads 
immediately to the possibility of studying the limiting Gibbs measures. 

Proof. It may be well worthwhile to outline the strategy of the proof 
in a slightly informal way before we go into the details. This may also 
give a new explanation to the mysterious looking equations above. It 
turns out that in a very specific sense, the idea of these equations and 
their derivation is closely related to the original idea of "mean field the- 
ory". Let us briefly recall what this means. The standard derivation 
of "mean field" equations for homogeneous magnets in most textbooks 
on statistical mechanics does not start from the Curie- Weiss model but 
from (i) the hypothesis that in the infinite volume limit, the spins are 
independent and identically distributed under the limiting (extremal) 
Gibbs measure and that (ii) their distribution is of the form e^ <TiTn where 
m is the mean value of the spin under this same measure, and that is 
assumed to be an almost sure constant with respect to the Gibbs mea- 
sure. The resulting consistency equation is then m = tanh /3m. This 
derivation breaks down in random systems, since it would be unrea- 
sonable to think that the spins are identically distributed. Of course 
one may keep the assumption of independence, and write down a set of 
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consistency equations (in the spin-glass case, these are know as TAP- 
equations [TAP]). Let us try the idea in Hopfield model. The spin o~i 
here couples to a "mean field" hi{a) = (^,m(cr)), which is a function 
of the entire vector of magnetizations. To obtain a self-consistent set of 
equations we would have to compute all of these, leading to the system 

m " = ^I]£f tanh(/?(&,m)) (8.19) 

i 

Solving this is a hopelessly difficult task when M is growing somewhat 
fast with N, and it is not clear why one should expect these quantities 
to be constants when M = aN. 

But now suppose it were true that we could somehow compute the 
distribution of hi(a) a priori as a function of a small number of param- 
eters, not depending on i. Assume further that these parameters are 
again functions of the distribution of the mean field. Then we could 
write down consistency conditions for them and (hopefully) solve them. 
In this way the expectation of cr; could be computed. The tricky part 
is thus to find the distribution of the mean field 8 . Miraculously, this 
can be done, and the relevant parameters turn out to be the quantities 
ft and r, with (8.16)-(8.18) the corresponding consistency equations 9 

We will now follow these ideas and give the individual steps a 
precise meaning. In fact, the first step in our proof corresponds to 
proving a version of Lemma 2.2 of [PST], or if one prefers, a sharpened 
version of Lemma 4.1 of [T4]. Note that we will never introduce any 
auxiliary Gaussian fields in the Hamiltonian, as is done systematically 
in [PST] and sometimes in [T4]; all comparison to quantities in these 



This idea seems related to statements of physicists one finds sometimes in the 
literature that in spin glasses, that the relevant "order parameter" is a actually a 
probability distribution. 

9 In fact, we will see that the situation is just a bit more complicated. For finite 
N, the distribution of the mean field will be seen to depend essentially on three 
^-dependent, non-random quantities whose limits, should they exist, are related 
to p,, r and q. Unfortunately, one of the notorious problems in disordered mean 
field type models is that one cannot prove a priori such intuitively obvious facts 
like that the mean values of thermodynamic quantities (such as the free energy, 
etc.) converge, even when it is possible to show that their fluctuations converge 
to zero (this sad fact is sometimes overlooked). We shall see that convergence of 
the quantities involved here can be proven in the process, using properties of the 
recurrence equations for which the equations above are the fixed point equations, 
and a priori control on the overlap distribution as results from Theorem 6.2 (or 7.5). 
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papers is thus understood modulo removal of such terms. Let us begin 
by mentioning that the crucial quantity u(t) defined in Definition 5 of 
[PST] has the following nice representation 10 

u{t) =\nj dQ%\ M {z)e^z) (8.20) 

where, like Talagrand in [T4] , we singled out the site N + 1 (instead 
of 1 as in [PST]) and set ^n+i = V- For notational simplicity we will 
denote the expectation w.r.t. the measure Qp^ M by Eg and we will 

set z = z — EqZ. 

Lemma 8.6. Under the hypotheses of Theorem 8.5 we have that 

(i) With probability exp. close to 1, 

E^Ege^ 77 ' 2 ") = e ^ E fi ||2|| 2 +i? (8.21) 

where \R\ < jj. 

(ii) Moreover, 

E^ (Ege T/3(?? ' 2) - E^Ege T/3 fo>*>) < ^ (8.22) 



Proof. Note first that 

E^Ege^' 2 ") < Ege^ 112 " 11 ' (8.23) 

and also 

E,Ege^< 2 ) > ^e^W-A\l-^\\A\\ (8 . 24) 

(8.23) looks most encouraging and (ii) of Corollary 8.4 leaves hope for 
the II^Hl to be irrelevant. Of course for this we want the expectation to 
move up into the exponent. To do this, we use (iii) of Corollary 8.4 with 

/ chosen ||| and T ^ lklll~ T i2 \\ z \\ti respectively. For this we 

have to check the strict convexity of $ + -^/ in these cases. But a simple 
computation shows that in both cases A m i n (V 2 ($ + jjf)) > e— 7^, so 
that for any r, (3 there is no problem if N is large enough (Note that the 
quartic term has the good sign!). A straightforward calculation shows 



Actually, our definition differs by an irrelevant constant from that of [PST]. 
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that this gives (8.21). 

To prove (ii), it is enough to compute 

(EQe Tf3(v ^y = E^Ege^'** 2 ') (8.25) 

where we (at last!) introduced the "replica" z' that is an independent 
copy of the random variable z. By some abuse of notation Eg also 
denotes the product measure for these two copies. By the same token 
as in the proof of (i), we see that, 

E,Ege T ^> 2+ *'> = e T -^^\\z+z'\\ 2 2 +o(i/N) (8 _ 26) 

Finally, 

e q\\z + z'\\1 = 2Eq\\z\\1 + 2Eq(z,z') = 2Eg||z||| (8.27) 

Inserting this and (8.21) into the left hand side of (8.22) establishes 
that bound. This concludes the proof of Lemma 8.6. 



An easy corollary gives what Talagrand's Lemma 4.1 should be: 

Corollary 8.7. Under the hypotheses of Lemma 8.6, there exists a 
finite numerical constant c such that 

u{t) = pT(r}, Eqz) + -JL.Eq\\z\\1 + Rn (8.28) 

where 

^\Rn\ 2 < ^ (8.29) 



Proof. Obviously 

E^e 1 "^'^ 

Q v Q E^Ege^^' 2 ) 

Taking logarithms, the first two factors in (8.30) together with (8.21) 
give the two first terms in (8.28) plus a remainder of order For the 
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last factor, we notice first that by Corollary 8.4, (iii), 



(8.31) 



so that for a small, r and j3a bounded, Ege^ 77 ' 2 ) is bounded away 
from and infinity; we might for instance think that \ < Ege r/3 ( ?? ' 2 ) < 
2. But for A, B in a compact interval of the positive half line not 
containing zero, there is a finite constant C such that | In ^ | = | In A — 
\n.B\ < C\A — B\. Using this gives 

2 



E. 



v 



In 



E ^ e r/3(77,2) 

E^Ege^^) 



< C 2 E V (e^ 13 ^^ - E^Ege^' 2 ^ (8.32) 



From this and (8.22) follows the estimate (8.29). 



We have almost proven the equivalent of Lemma 2.2 in [PST]. What 
remains to be shown is 

Lemma 8.8: Under the assumptions of Theorem 8.1 (t},'Eqz) con- 
verges in law to rjift + \facrg where fi = limTVj-oo ^qZ\ and r = 

a -1 lim7V|oo ||Eq-S|| 2 , where z = (0,2:2,2:3,...,) and g is a standard 
normal random variable. 

Quasiproof:[PST] The basic idea behind this lemma is that for all (i > 
1, EgZ^ tends to zero, the rj^ are independent amongst each other and 
of the Eg^ M and that therefore ^ M >i ^m^q-^m converge to a Gaussians 

11 1 1 2 

with variance lini/v-roo ||^q^|| 2 - 



To make this idea precise is somewhat subtle. First, to prove a 

central limit theorem, one has to show that some version of the Linde- 

berg condition [CT] is satisfied in an appropriate sense. To do this we 

need some more facts about self-averaging. Moreover, one has to make 

11 1 1 2 

precise to what extent the quantities Eg^i and ||Eg5|| converge, as 
iV tends to infinity. There is no way to prove this a priori, and only 
at the end of the proof of Theorem 8.5 will it be clear that this is the 
case. Thus we cannot and will not use Lemma 8.8 in the proof of the 
Theorem, but a weaker statement formulated as Lemma 8.13 below. 

The following lemma follows easily from the proof of Talagrand's 
Proposition 4.3 in [T5]. 
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Lemma 8.9. Assume that f(x) is a convex random function defined 
on some open neighborhood U C R. Assume that f verifies for all 
xeU that \(Ef)"(x)\ < C < oo and E(f(x) - Ef{x)) 2 < S 2 . Then, if 
x ± S/C e U 



E (/'(a) - Ef {x)f < 12CS (8.33) 



But as so often in this problem, variance estimates are not quite 
sufficient. We will need the following, sharper estimate (which may be 
well known): 

Lemma 8.10. Assume that f(x) is a random function defined on some 
open neighborhood f/cl. Assume that f verifies for all x G U that 
for all < r < I, 



( Nr 2 \ 

[\f(x)-Ef(x)\ >r]< cexp f (8.34) 



and that, at least with probability 1 — p, \ f'(x)\ < C, \f"(x)\ < C < oo 
both hold uniformly in U . Then, for any < Q < 1/2, and for any 
0<5< N^ 2 , 



P 



\f(x) -Ef{x)\ > 5N-^ 2 ] < ^iv^exp (-^^J +P 

(8.35) 



Proof. Let us assume that \U\ < 1. We may first assume that 
the boundedness conditions for the derivatives of / hold uniformly; by 
standard arguments one shows that if they only hold with probability 
1 — p, the effect is nothing more than the final summand p in (8.35). 
The first step in the proof consists in showing that (8.34) together with 
the boundedness of the derivative of / implies that f(x) — Ef(x) is 
uniformly small. To see this introduce a grid of spacing e, i.e. let 
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U e = UH eZ. Clearly 



P 



sup \f(x) — E/(x)| > r 
xeu 



< P sup -E/(x)| 



+ sup |/(x)-/(y)| + |E/(x)-E/(y)|>r 



<P sup |/(x)-E/(x)| >r-2Ce 



(8.36) 



< e _1 P[|/(x) -E/(x)| > r-2Ce] 
If we choose e = this yields 



P 



sup|/(x)-E/(x)| >r 

xeu 



AC 
< — exp 

r 



~Ac 



(8.37) 



Next we show that if sup xeU \f(x) — g(x)\ < r for two functions /, g 
with bounded second derivative, then 



\f'(x)-g'(x)\<V8C^ 



For notice that 
1 



[/(* + €)- /(*)]-/'(*) 



<| sup f"(y)<C e - 



(8.38) 



(8.39) 



so that 



1 



- g'(x)\ < -\f(x + e) - <7(:r + e) - /(x) + </(a;)| + Ce 

< — + Ce 
e 

Choosing the optimal e = y/2r/C gives (8.38). It suffices to combine 
(8.38) with (8.37) to get 



P 



\f'(x)-Ef'(x)\ > V&C 
Setting r = , we arrive at (8.35) 



4C 
< — exp 

r 



iVr 2 
^k 7 



(8.41) 
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We will now use Lemma 8.10 to control EqZ^. We define 

= -L In / d M ze t3Nxz» e -l3N3>e, N , M {z) (g_ 42 ) 

and denote by Eq x the corresponding modified expectation. As has by 
now been shown many times [T2,BG5,T4], f(x) verifies (8.34). More- 
over, f'(x) = Eq x Zh and 

f"(x) = (3NE& - E^zrf (8.43) 

Of course the addition of the linear term to $ does not change its 
second derivative, so that we can apply the Brascamp-Lieb inequalities 
also to the measure Eq x . This shows that 

E fi . (z,-Eq x z,) 2 <-L^ (8.44) 

which means that f(x) has a second derivative bounded by c = -. 
Remark. In the sequel we will use Lemma 8.10 only in situations 
where p is irrelevantly small compared to the main term in (8.35). We 
will thus ignore its existence for simplicity. 
This gives the 

Corollary 8.11. Under the assumptions of Theorem 8.1, there are 
finite positive constants c,C such that, for any £ < \ and 5 < N^ 2 , 
for any \i, 



P 



E 6 z M -EE 6 ^| > < _ArC eX p f J (8.45) 



This leaves us only with the control of EEqZ^. But by symmetry, 
for all \x > 1, EEgZ M = EEqZ2 while on the other hand 

M 

£(EE 6 *„) 2 < c^im*) 2 (8.46) 

so that IEEq^I < ^-A -1 / 2 . Therefore, with probability of order, say 
1 - exp(-A 1 " 2 ^) it is true that for all \l > 2, \EqZ^\ < 5N~^ 2 . 
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Finally we must control the behaviour of the prospective variance 
of our gaussian. We set T/v = E^T^^g- 2 ^) 2 - Let us introduce 

g(x) = -^\nE^ N ^ (8.47) 

where Eg is understood as the product measure for the two independent 
copies z and z' . The point is that Tjv = g'(0)- On the other hand, g 
satisfies the same self-averaging conditions as the function / before, 
and its second derivative is bounded (for x < e/2), since 

g"{x) = pm.fr {{z,z')-^fr(z,z')) 2 

2/3 ,, ll2 (3 (8- 48 ) 
< -j2Eq x \\z\\ 2 2 < 2p^ 

where here E^ stands for the coupled measure corresponding to (8.47) 
(and is not the same as the the measure with the same name in (8.43)). 
Thus we get our second corollary: 

Corollary 8.12. Under the assumptions of Theorem 8.1, there are 
finite positive constants c, C such that, for any £ < \ and 5 < N^ 2 , 



P 



■ N 



i C { S 4 N 1 ~ 2< ^\ 
ET N \ > 5iV- c/2 J < exp ( j (8.49) 



Thus T/v converges almost surely to a constant if ETjv converges. 
We are now in a position to prove 



Lemma 8.13. Consider the random variables X 



N 



M{N) n E-z 



E 



Then, if the hypotheses of Theorem 8.5 are satisfied, Xn converges 
weakly to a gaussian random variable of mean zero and variance one. 



Proof. Let us show that Ee ltXN converges to e~ l I 2 . To see this, let 
Oat denote the subset of Q on which the various nice things we want 
to impose on EqZ^ hold; we know that the complement of that set has 



measure smaller than 0(e 



). We write 



Ee UXN = Ez [S.Q N E r] e itXN + ^ N E v e itXN ] 

t 



= Et 



In* n 



cos 



+ Q e 



(8.50) 
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Thus the second term tends to zero rapidly and can be forgotten. On 
the other hand, on O^, 



M 



M 



$>^) 4 < 5 2 N~< $>^ M ) 2 < s 2 n~ 



ca 



jU=2 



(m*) 2 



(8.51) 



tends to zero, so that using for instance |lncosx — x 2 /2\ < cx 4 for 

M < i, 



E^E^e 
< e"* 2/2 sup 



T N - ET N t 4 8 2 N~< 
exp | h c 



2ET 



(ET N y 



(8.52) 



Clearly, since also |T/v — ET/v| < 5N ^ 2 , the right hand side converges 
to e~* 1 2 and this proves the lemma. 



Corollary 8.7 together with Lemma 8.13 represent the complete 
analogue of Lemma 2.2 of [PST]. To derive from here the equations 
(8. 16)- (8. 18) requires actually a little more, namely a corresponding 
statement on the convergence of the derivative of u[j). Fortunately, 
this is not very hard to show. 

Lemma 8.14. Set u(t) = u\{t) + u 2 (t), where u\{t) = r/3(?7, Eg;?) 
and tt2 (t) = lnEge /3r (' ? ' 2: ) . Then under the assumption of Corollary 
8.13, 

(i) £f u i{ T ) converges weakly to a standard gaussian random 
variable. 

(ii) \-£pU 2 (r) — r/3 2 EEg||2|||| converges to zero in probability. 

Proof. (i) is obvious from Corollary 8.13. To prove (ii), note that 
it 2 (r) is convex and £^u 2 (t) < Thus, if vax (u 2 (t)) < 

then var(^rw 2 (r)) < -^74 by Lemma 8.9. On the other hand, 
|Eu 2 (t) - ^EEg||z|||| < -^=, by Corollary 8.7, which, together 
with the boundedness of the second derivative of u 2 (r) implies that 
\-^Eu 2 (t) - t/^EEqII^II 2 .! I 0. This means that var (u 2 (t)) < -2- im- 
plies the Lemma. Since we already know that ER 2 N < ^, it is enough 
to prove var < -j^- But this is a, by now, familiar exercise. 
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The point is to use that Eg||z||| = -^g(x), where 

g( x ) = J^]nE & e^ Nx ^ (8.53) 

and to prove that var (g(x)) < ^. using what we know about ||Egz||2 
this follows as in the case of the function g{x). The proof is finished. 



From here we can follow [PST]. Let us denote by Eg the expecta- 
tion with respect to the (conditional) induced measures Q^'^m- Note 
first that (8.9) implies that 11 Egm^ = EgZ M . On the other hand, 

1 N 

i=i 

and so, by symmetry 

EEq n+i ( Zfl ) = v^(3,n+i,m(°n+i) (8.55) 

Note that from here on we will make the iV-dependence of our mesures 
explicit, as we are going to derive recursion relations. Now, u(r) was 
defined such that 

e «(i) _ e ^(-i) 
E^,jv+i,m((Tjv+i) = E gu(1) +gU( _ 1) 

= Etanh(/?(7 7l Eg jv 2 1 + y/VT^X N )) + o(l) 

(8.56) 

Thus, if Eq^^i and ET^v converge, by Lemma 8.13, the limit must 
satisfy (8.16). Of course we still need an equation for ETjv which is 
somewhat tricky. Let us first define a quantity EQtv by 

EQ N = Etanh 2 (/3(r7iEg jv zi + y/ET N X N )) (8.57) 

This corresponds of course to (8.17). Now note that T N = ||Eg- z\\\ — 



This relation is exact, if the tilted measures are considered, and it is true up 
to irrelevant error terms if one considers the conditioned measures. 
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(E^zO 2 and 



M / N+l N 

M=l V i=l / 

M — 1 / (id , ,\ 2 
= ^>+i,m(^+i) J 

M { 1 N 

li=l \ i—\ , 

(8.58) 

We see that the first term gives, by definition and (8.56), aEQ^. For 
the second term, we use the identity form [PST] 



.59) 



M ( 1 N \ _iE =±1 u'{r)e u ^ 

iu=l V i=l / Z^ T =±1 

which it is not too hard to verify. Together with Lemma 8.14 one 
concludes that in law up to small errors 

M / 1 N \ 

Yl ]y ^ C^,^V+1,M(^) ) = ^ n+1 Eq n Z! + ^/ET N X N 

H=l \ i=l J 

+ /3Eg iv ||^||2tanh/3 *i + v^V^at) 



.60) 



and so 



E\\Eq n+i z\\1 = oEQ n + E 



tanh/3 (e^+iEQ^^ + a/ETV^at) 



x 



+ (3EEq n \\z\\l tanh 2 /3 (^E^ + v / ET^Ar) 

(8.61) 

Using the self-averaging properties of Eg H^H 2 .) the last term is of 
course essentially equal to 



/3EEqJz\\ 2 2 EQ n 



(8.62) 



The appearance of Eq n \\z\\\ is disturbing, as it introduces a new quan- 
tity into the system. Fortunately, it is the last one. The point is that 
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proceeding as above, we can show that 



EE 



Qn+ 



J|z||! =a + E 



tanh/3 



X 



N 



+ (3EEq n \\z\\ 2 2 EQ N 



(8.63) 

so that setting Un = Eg H^Hl, we get, subtracting (8.61) from (8.63), 
the simple recursion 



EU N+1 = a(l - EQ N ) + (3(1 - EQ N )EU N 



(8.64) 



From this we get (since all quantities considered are self-averaging, we 
drop the E to simplify the notation), setting Mn = Eg zi, 

Tn+i = -(M N+ i) 2 + aQ N + (3U N Q N 

+ J dU(g)[M N + y^g] tsaih [3(M N + yff^g) 

= M N+1 (M N - M N+1 ) + (3U N Q N + pT N (l - Q N ) + aQ N 

(8.65) 

where we used integration by parts. The complete system of recursion 
relations can thus be written as 

M N+1 = J dN(g) tanh/3 (m n + y^g) 

T N +i = Mn-^Mn - M N+1 ) + (3U n Qn + /37V (1 - Q N ) + aQ N 
U N +i = a(l - Q N ) + (3(1 - Q N )U N 



Qn+i = J dN{g) tanh 2 /3 [m n + y^g) 



(8.66) 

We leave it to the reader to check that the fixed points of this sys- 
tem lead to the equations (8.16)-(8.18) with r = limjv-[-oo T N /a : q = 
liniTvioo Qn and m\ = liniArioo Mn (where the variable u = limTvioo Un 
is eliminated). 

We have dropped both the o(l) errors and the fact that the param- 
eters (3 and a are slightly changed on the left by terms of order 1/N . 
The point is that, as explained in [T4], these things are irrelevant. The 
point is that from the localization results of the induced measures we 
know a priori that for all N, if a and (3 are in the appropriate domain, 
the four quantities are in a well defined domain. Thus, if this domain 
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is attracted by the "pure" recursion (8.66), then we may choose some 
function f(N) tending (slowly) to infinity (e.g. f(N) = In AT) would be 
a good choice) and iterate f(N) times; letting iV tend to infinity then 
gives the desired convergence to the fixed point. 

The necessary stability analysis, which is finally an elementary an- 
alytical problem can be found in [T4], Lemma 7.9 where it was ap- 
parently carried out for the first time in rigorous form (a numerical 
investigation can of course be found in [AGS]). It shows that all is well 
if a/3 and 7 are small enough. 



It is a particularly satisfying feature of the proof of Theorem 8.5 
that in the process we have obtained via Corollary 8.7 and Lemma 8.13 
control over the limiting probability distribution of the "mean field", 
felt by an individual spin o~i. In particular, the facts we have 
gathered also prove Lemma 8.8. Indeed, since u(t) is the logarithm of 
the Laplace transform of that field we can identify it with a gaussian 
of variance EEg H^lH and mean Kq n z± + ^/orgi, where gi is itself a 
standard gaussian. Moreover, esssentially the same analysis allows to 
control not only the distribution of a single field (£, m), but of any finite 
collection, (£j,m)j e y, of them. Form this we are able to reconstruct 
the probability distribution of the Gibbs measures: 

Theorem 8.15. Under the conditions of Theorem 8.5, for any finite 
set V C N, the corresponding marginal distributions of the Gibbs mea- 
sures I*pnm(n) ( ai = s i^i e V) converge in law to 

11 2cosh(/3(/^ 1 + v /a7^) 

where gi, i E V are independent standard gaussian random variables. 

Remark. In the language of Newman [NS] the above theorem iden- 
tifies the limiting Aizenman-Wehr met ast ate 12 for our system. Note 
that there seems to be no (reasonable) way to enforce almost sure con- 
vergence of Gibbs states for a > 0. In fact, the gi are continuous 
unbounded random variables, and by chosing suitable random subse- 
quences Ni, we can construct any desired product measure as limiting 
measure!! Thus in the sense of the definition of limiting Gibbs states in 
Section II, we must conclude that for positive a, all product measures 

12 

It would be interesting to study also the "empirical metastateV 
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are extremal measures for our system, a statement that may seem sur- 
prising and that misses most of the interesting information contained 
in Theorem 8.12. Thus we stress that this provides an example where 
the only way to express the full available information on the asymp- 
totics of the Gibbs measures is in terms of their probability distribution, 
i.e. through metastates. Note that in our case, the metatstate is con- 
centrated on product mesures which can be seen as a statement on 
"propagation of chaos" [Sn]. Beyond the "replica symmetric regime" 
this should no longer be true, and the metastate should then live on 
mixtures of product measures. 

Proof. We will give a brief sketch of the proof of Theorem 8.15. More 
details are given in [BG6]. It is a simple matter to show that 

f 1 f3,N,M( <7i = s i^ ieV ) 

^-7^E^v- lncosh03(5 *' z)) l flV. 

J p — ' if v 



B„(m*e 1 ) 



II 2 "I 

f n d M ze L P ,eV J FT 2cosh(/3(£ i ,z)) 



(8.67) 

Note that there is, for V fixed and N tending to infinity, vir- 
tually no difference between the function <&/3,jv,M and ^y 2 - — 
~fiN Ei^y lncosh(/3(^, z)) so we will simply pretend they are the same. 
So we may write in fact 

E a /£ ie v*«"*) 

= e *0 = — gV 71F7) (8 - 68) 

t-^crv i sIn-\v\ 



Now we proceed as in Lemma 8.6. 

E^/Eiev = /E i6 v s ^- E Q^/E i6 v s ^) (8.69) 

The second factor is controlled just as in Lemma 8.6, and up to terms 
that converge to zero in probability is independent of sy- It will thus 
drop out in the ratio in (8.68). The exponent in the first term is 
treated as in Lemma 8.8; since all the % e V are independent, 
we obtain that the Egi) converge indeed to independent gaussian 
random variables. We omit the details of the proof of the analogue 
of Lemma 8.9; but note that Egi) are uncorrelated, and this is 
enough to get independence in the limit (since uncorrelated gaussians 
are independent). From here the proof of Theorem 8.15 is obvious. 
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We stress that we have proven that the Gibbs measures converge 
weakly in law (w.r.t. to P) to some random product measure on the 
spins. Moreover it should be noted that the probabilities of local events 
(i.e. the expressions considered in Theorem 8.15) in the limit are not 
measurable with respect to a local sigma-algebra, since they involve 
the gaussians g^. These are, as we have seen, obtained in a most com- 
plicated way from the entire set of the EgZ^, which depend of course 
on all the It is just fortunate that the covariance structure of the 
family of gaussians gi, i G V, is actually deterministic. This means in 
particular that if we take a fixed configuration of the £ and pass to the 
limit, we cannot expect to converge. 

Fianlly let us point out that to get propagation of chaos not all what 
was needed to prove Theorem 8.8 is really necessary. The main fact we 

used in the proof is the self-averaging of the quantity K^e 13 Si (^' 2 ) ? 
i.e. essentially (ii) of Lemma 8.6, while (i) is not needed. The second 
property is that IEgz) converges in law, while it is irrelevant what 
the limit would be (these random variables might well be dependent). 
Unfortunately(?), to prove (ii) of Lemma 8.6 requires more or less the 
same hypotheses as everything else (i.e. we need Theorem 8.1!), so this 
observation makes little difference. Thus ist may be that propagation 
of chaos and the exactness of the replica symmetric solution always go 
together (as the results in [PST] imply). 

While in our view the results presented here shed some light on the 
"mystery of the replica trick" , we are still far from understanding the 
really interesting phenomenon of "replica symmetry breaking". This 
remains a challenge for the decade to come. 
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